We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Apr 2019

[ total of 169 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-169 ]
[ showing 25 entries per page: fewer | more | all ]
[1]  arXiv:1904.00055 [pdf, other]
Title: Joining Sound Event Detection and Localization Through Spatial Segregation
Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:1904.00063 [pdf, other]
Title: Multi-Scale Time-Frequency Attention for Acoustic Event Detection
Comments: Accepted by Interspeech 2019
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3]  arXiv:1904.00202 [pdf, other]
Title: Static Visual Spatial Priors for DoA Estimation
Comments: 6 pages, 6 figures, 3 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:1904.01578 [pdf, other]
Title: Unsupervised training of neural mask-based beamforming
Comments: Correction to Eq. 11: Hermite symbol was on the wrong variable. Replaces y with the normalized version
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Machine Learning (stat.ML)
[5]  arXiv:1904.01916 [pdf, other]
Title: End-to-end Binaural Sound Localisation from the Raw Waveform
Comments: Accepted by ICASSP 2019
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:1904.02096 [pdf, other]
Title: GEDI: Gammachirp Envelope Distortion Index for Predicting Intelligibility of Enhanced Speech
Comments: Preprint, 37 pages, 6 tables, 9 figures
Journal-ref: Speech Communication, Vol. 123, pp. 43-58, 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:1904.02334 [pdf, other]
Title: Multi-modal Blind Source Separation with Microphones and Blinkies
Comments: Accepted at IEEE ICASSP 2019, Brighton, UK. 5 pages. 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8]  arXiv:1904.02882 [pdf, other]
Title: LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Comments: Submitted for Interspeech 2019, 7 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:1904.02892 [pdf, ps, other]
Title: WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation
Comments: Submitted to INTERSPEECH2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[10]  arXiv:1904.03065 [pdf, other]
Title: Recursive speech separation for unknown number of speakers
Comments: Interspeech 2019 (oral)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:1904.03418 [pdf, other]
Title: Towards Generalized Speech Enhancement with Generative Adversarial Networks
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12]  arXiv:1904.03476 [pdf, other]
Title: Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems
Comments: 5 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13]  arXiv:1904.03479 [pdf, other]
Title: Large Margin Softmax Loss for Speaker Verification
Authors: Yi Liu, Liang He, Jia Liu
Comments: submitted to Interspeech 2019. The code and models have been released
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:1904.03522 [pdf, other]
Title: Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data
Comments: Accepted to EUSIPCO 2020
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15]  arXiv:1904.03543 [pdf, ps, other]
Title: Spatio-Temporal Attention Pooling for Audio Scene Classification
Comments: To appear at the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[16]  arXiv:1904.03617 [pdf, other]
Title: VAE-based regularization for deep speaker embedding
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:1904.03787 [pdf, other]
Title: Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation
Comments: 5 pages, 2 figures. Accepted at ICASSP 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18]  arXiv:1904.03814 [pdf, other]
Title: Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Comments: In INTERSPEECH 2019
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[19]  arXiv:1904.03833 [pdf, other]
Title: Direct Modelling of Speech Emotion from Raw Speech
Comments: INTERSPEECH 2019
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:1904.03841 [pdf, other]
Title: Duration robust weakly supervised sound event detection
Comments: Accepted by ICASSP2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:1904.04540 [pdf, ps, other]
Title: Crossmodal Voice Conversion
Comments: Submitted to Interspeech2019
Subjects: Sound (cs.SD); Machine Learning (stat.ML)
[22]  arXiv:1904.04631 [pdf, other]
Title: CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
Comments: Accepted to ICASSP 2019. Project page: this http URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[23]  arXiv:1904.04956 [pdf, other]
Title: Distributed Deep Learning Strategies For Automatic Speech Recognition
Comments: Published in ICASSP'19
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[24]  arXiv:1904.05009 [pdf, other]
Title: An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks
Comments: Accepted for presentation at the International Conference on New Interfaces for Musical Expression (NIME), June 2019
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[25]  arXiv:1904.05073 [pdf, other]
Title: Neuralogram: A Deep Neural Network Based Representation for Audio Signals
Comments: Submitted to DAFx 2019, the 22nd International Conference on Digital Audio Effects, Birmingham, United Kingdom
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[ total of 169 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-169 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help  (Access key information)