Sound

Authors and titles for cs.SD in Apr 2019

[ total of 169 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-169 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:1904.00055 [pdf, other]: Title: Joining Sound Event Detection and Localization Through Spatial Segregation

Authors: Ivo Trowitzsch, Christopher Schymura, Dorothea Kolossa, Klaus Obermayer

Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:1904.00063 [pdf, other]: Title: Multi-Scale Time-Frequency Attention for Acoustic Event Detection

Authors: Jingyang Zhang, Wenhao Ding, Jintao Kang, Liang He

Comments: Accepted by Interspeech 2019

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3] arXiv:1904.00202 [pdf, other]: Title: Static Visual Spatial Priors for DoA Estimation

Authors: Pawel Swietojanski, Ondrej Miksik

Comments: 6 pages, 6 figures, 3 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:1904.01578 [pdf, other]: Title: Unsupervised training of neural mask-based beamforming

Authors: Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach

Comments: Correction to Eq. 11: Hermite symbol was on the wrong variable. Replaces y with the normalized version

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Machine Learning (stat.ML)
[5] arXiv:1904.01916 [pdf, other]: Title: End-to-end Binaural Sound Localisation from the Raw Waveform

Authors: Paolo Vecchiotti, Ning Ma, Stefano Squartini, Guy J. Brown

Comments: Accepted by ICASSP 2019

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:1904.02096 [pdf, other]: Title: GEDI: Gammachirp Envelope Distortion Index for Predicting Intelligibility of Enhanced Speech

Authors: Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani

Comments: Preprint, 37 pages, 6 tables, 9 figures

Journal-ref: Speech Communication, Vol. 123, pp. 43-58, 2020

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:1904.02334 [pdf, other]: Title: Multi-modal Blind Source Separation with Microphones and Blinkies

Authors: Robin Scheibler, Nobutaka Ono

Comments: Accepted at IEEE ICASSP 2019, Brighton, UK. 5 pages. 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:1904.02882 [pdf, other]: Title: LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

Authors: Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu

Comments: Submitted for Interspeech 2019, 7 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:1904.02892 [pdf, ps, other]: Title: WaveCycleGAN2: Time-domain Neural Post-filter for Speech Waveform Generation

Authors: Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, Nobukatsu Hojo

Comments: Submitted to INTERSPEECH2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[10] arXiv:1904.03065 [pdf, other]: Title: Recursive speech separation for unknown number of speakers

Authors: Naoya Takahashi, Sudarsanam Parthasaarathy, Nabarun Goswami, Yuki Mitsufuji

Comments: Interspeech 2019 (oral)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:1904.03418 [pdf, other]: Title: Towards Generalized Speech Enhancement with Generative Adversarial Networks

Authors: Santiago Pascual, Joan Serrà, Antonio Bonafonte

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:1904.03476 [pdf, other]: Title: Cross-task learning for audio tagging, sound event detection and spatial localization: DCASE 2019 baseline systems

Authors: Qiuqiang Kong, Yin Cao, Turab Iqbal, Yong Xu, Wenwu Wang, Mark D. Plumbley

Comments: 5 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:1904.03479 [pdf, other]: Title: Large Margin Softmax Loss for Speaker Verification

Authors: Yi Liu, Liang He, Jia Liu

Comments: submitted to Interspeech 2019. The code and models have been released

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:1904.03522 [pdf, other]: Title: Taco-VC: A Single Speaker Tacotron based Voice Conversion with Limited Data

Authors: Roee Levy Leshem, Raja Giryes

Comments: Accepted to EUSIPCO 2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:1904.03543 [pdf, ps, other]: Title: Spatio-Temporal Attention Pooling for Audio Scene Classification

Authors: Huy Phan, Oliver Y. Chén, Lam Pham, Philipp Koch, Maarten De Vos, Ian McLoughlin, Alfred Mertins

Comments: To appear at the 20th Annual Conference of the International Speech Communication Association (INTERSPEECH 2019)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[16] arXiv:1904.03617 [pdf, other]: Title: VAE-based regularization for deep speaker embedding

Authors: Yang Zhang, Lantian Li, Dong Wang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17] arXiv:1904.03787 [pdf, other]: Title: Bayesian Non-Parametric Multi-Source Modelling Based Determined Blind Source Separation

Authors: Chaitanya Narisetty, Tatsuya Komatsu, Reishi Kondo

Comments: 5 pages, 2 figures. Accepted at ICASSP 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18] arXiv:1904.03814 [pdf, other]: Title: Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Authors: Seungwoo Choi, Seokjun Seo, Beomjun Shin, Hyeongmin Byun, Martin Kersner, Beomsu Kim, Dongyoung Kim, Sungjoo Ha

Comments: In INTERSPEECH 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[19] arXiv:1904.03833 [pdf, other]: Title: Direct Modelling of Speech Emotion from Raw Speech

Authors: Siddique Latif, Rajib Rana, Sara Khalifa, Raja Jurdak, Julien Epps

Comments: INTERSPEECH 2019

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:1904.03841 [pdf, other]: Title: Duration robust weakly supervised sound event detection

Authors: Heinrich Dinkel, Kai Yu

Comments: Accepted by ICASSP2020

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:1904.04540 [pdf, ps, other]: Title: Crossmodal Voice Conversion

Authors: Hirokazu Kameoka, Kou Tanaka, Aaron Valero Puche, Yasunori Ohishi, Takuhiro Kaneko

Comments: Submitted to Interspeech2019

Subjects: Sound (cs.SD); Machine Learning (stat.ML)
[22] arXiv:1904.04631 [pdf, other]: Title: CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

Authors: Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo

Comments: Accepted to ICASSP 2019. Project page: this http URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[23] arXiv:1904.04956 [pdf, other]: Title: Distributed Deep Learning Strategies For Automatic Speech Recognition

Authors: Wei Zhang, Xiaodong Cui, Ulrich Finkler, Brian Kingsbury, George Saon, David Kung, Michael Picheny

Comments: Published in ICASSP'19

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[24] arXiv:1904.05009 [pdf, other]: Title: An Interactive Musical Prediction System with Mixture Density Recurrent Neural Networks

Authors: Charles P Martin, Jim Torresen

Comments: Accepted for presentation at the International Conference on New Interfaces for Musical Expression (NIME), June 2019

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[25] arXiv:1904.05073 [pdf, other]: Title: Neuralogram: A Deep Neural Network Based Representation for Audio Signals

Authors: Prateek Verma, Chris Chafe, Jonathan Berger

Comments: Submitted to DAFx 2019, the 22nd International Conference on Digital Audio Effects, Birmingham, United Kingdom

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

[ total of 169 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-169 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Apr 2019