We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions, skipping first 34

[ total of 29 entries: 1-25 | 5-29 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 4 Dec 2019

[5]  arXiv:1912.01231 [pdf, other]
Title: HI-MIA : A Far-field Text-Dependent Speaker Verification Database and the Baselines
Comments: Submitted to ICASSP 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:1912.01219 [pdf, other]
Title: WaveFlow: A Compact Flow-based Model for Raw Audio
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7]  arXiv:1912.01542 (cross-list from eess.SP) [pdf]
Title: Design of an algorithm for acoustic signal detection of moving vehicles
Comments: 5 pages, 5 figures
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8]  arXiv:1912.01167 (cross-list from eess.AS) [pdf, other]
Title: High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Tue, 3 Dec 2019

[9]  arXiv:1912.00766 [pdf, other]
Title: Three Orthogonal Dimensions for Psychoacoustic Sonification
Comments: Keywords: Auditory Display, Audition, Noise/acoustics, Sound Design, Interpretability
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[10]  arXiv:1912.00955 (cross-list from cs.CL) [pdf, other]
Title: Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection
Comments: Submitted for ICASSP 2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:1912.00938 (cross-list from eess.AS) [pdf]
Title: Speaker detection in the wild: Lessons learned from JSALT 2019
Comments: Submitted to ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12]  arXiv:1912.00866 (cross-list from q-bio.NC) [pdf]
Title: Voice Biomarker Identification for Effects of Deep-Brain Stimulation on Parkinson's Disease
Comments: 5 pages, including 3 tables, 2 figures, and references
Subjects: Neurons and Cognition (q-bio.NC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13]  arXiv:1912.00846 (cross-list from cs.LG) [pdf, other]
Title: Attentive Modality Hopping Mechanism for Speech Emotion Recognition
Comments: 6 pages. arXiv admin note: text overlap with arXiv:1904.10788
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Machine Learning (stat.ML)

Mon, 2 Dec 2019

[14]  arXiv:1911.13254 [pdf, other]
Title: Music Source Separation in the Waveform Domain
Authors: Alexandre Défossez (FAIR, SIERRA, PSL), Nicolas Usunier (FAIR), Léon Bottou (FAIR), Francis Bach (DI-ENS, PSL, SIERRA)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[15]  arXiv:1911.12928 [pdf, other]
Title: Improving Voice Separation by Incorporating End-to-end Speech Recognition
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16]  arXiv:1911.12926 [pdf, other]
Title: J-Net: Randomly weighted U-Net for audio source separation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:1911.12618 [pdf, other]
Title: Machine learning for music genre: multifaceted review and experimentation with audioset
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18]  arXiv:1911.13087 (cross-list from cs.CL) [pdf, other]
Title: Kurdish (Sorani) Speech to Text: Presenting an Experimental Dataset
Comments: 4 pages, 1 figure
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19]  arXiv:1911.12760 (cross-list from cs.LG) [pdf, other]
Title: Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech
Comments: Submitted to ICASSP 2020
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[20]  arXiv:1911.12747 (cross-list from cs.CV) [pdf, other]
Title: ASR is all you need: cross-modal distillation for lip reading
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:1911.12617 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised Neural Mask Estimator For Generalized Eigen-Value Beamforming Based ASR
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[22]  arXiv:1911.12616 (cross-list from eess.AS) [pdf, other]
Title: Performance Comparison of UCA and UCCA based Real-time Sound Source Localization Systems using Circular Harmonics SRP Method
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23]  arXiv:1911.12505 (cross-list from cs.LG) [pdf, ps, other]
Title: Augmentation Methods on Monophonic Audio for Instrument Classification in Polyphonic Music
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[24]  arXiv:1911.12487 (cross-list from cs.CL) [pdf, other]
Title: Minimum Bayes Risk Training of RNN-Transducer for End-to-End Speech Recognition
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 28 Nov 2019

[25]  arXiv:1911.11879 [pdf, other]
Title: SchrödingeRNN: Generative Modeling of Raw Audio as a Continuously Observed Quantum State
Comments: 32 pages, 20 figures, under review for MSML 2020
Subjects: Sound (cs.SD); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[26]  arXiv:1911.11775 [pdf]
Title: Improving Polyphonic Music Models with Feature-Rich Encoding
Authors: Omar Peracha
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[27]  arXiv:1911.12361 (cross-list from cs.CV) [pdf, ps, other]
Title: GLA in MediaEval 2018 Emotional Impact of Movies Task
Comments: MediaEval 2018, 29-31 October 2018, Sophia Antipolis, France. This work is presented at the workshop in MediaEval 2018 for the Emotional Impact of Movies Task
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[28]  arXiv:1911.11935 (cross-list from cs.CL) [pdf, other]
Title: AIPNet: Generative Adversarial Pre-training of Accent-invariant Networks for End-to-end Speech Recognition
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29]  arXiv:1911.11853 (cross-list from eess.AS) [pdf, other]
Title: Neural Percussive Synthesis Parameterised by High-Level Timbral Features
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[ total of 29 entries: 1-25 | 5-29 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 1912, contact, help  (Access key information)