Audio and Speech Processing

Authors and titles for eess.AS in Nov 2019

[ total of 155 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-155 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:1911.00137 [pdf, other]: Title: Modeling of Rakugo Speech and Its Limitations: Toward Speech Synthesis That Entertains Audiences

Authors: Shuhei Kato, Yusuke Yasuda, Xin Wang, Erica Cooper, Shinji Takaki, Junichi Yamagishi

Comments: Resubmitted to IEEE Access

Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:1911.00432 [pdf, other]: Title: Deep neural networks for emotion recognition combining audio and transcripts

Authors: Jaejin Cho, Raghavendra Pappagari, Purva Kulkarni, Jesus Villalba, Yishay Carmiel, Najim Dehak

Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:1911.00527 [pdf, other]: Title: Memory Requirement Reduction of Deep Neural Networks Using Low-bit Quantization of Parameters

Authors: Niccoló Nicodemo, Gaurav Naithani, Konstantinos Drossos, Tuomas Virtanen, Roberto Saletti

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Performance (cs.PF); Sound (cs.SD)
[4] arXiv:1911.00566 [pdf, other]: Title: Predicting word error rate for reverberant speech

Authors: Hannes Gamper, Dimitra Emmanouilidou, Sebastian Braun, Ivan J. Tashev

Comments: Presented at IEEE 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[5] arXiv:1911.00940 [pdf, other]: Title: Robust speaker recognition using unsupervised adversarial invariance

Authors: Raghuveer Peri, Monisankha Pal, Arindam Jati, Krishna Somandepalli, Shrikanth Narayanan

Comments: Submitted to ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[6] arXiv:1911.00982 [pdf, other]: Title: Onssen: an open-source speech separation and enhancement library

Authors: Zhaoheng Ni, Michael I Mandel

Comments: Submitted to ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:1911.01182 [pdf, other]: Title: Voice Biometrics Security: Extrapolating False Alarm Rate via Hierarchical Bayesian Modeling of Speaker Verification Scores

Authors: Alexey Sholokhov, Tomi Kinnunen, Ville Vestman, Kong Aik Lee

Comments: Accepted to be published in Computer Speech and Language

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[8] arXiv:1911.01255 [pdf, other]: Title: pyannote.audio: neural building blocks for speaker diarization

Authors: Hervé Bredin, Ruiqing Yin, Juan Manuel Coria, Gregory Gelly, Pavel Korshunov, Marvin Lavechin, Diego Fustes, Hadrien Titeux, Wassim Bouaziz, Marie-Philippe Gill

Comments: Submitted to ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:1911.01266 [pdf, other]: Title: Supervised online diarization with sample mean loss for multi-domain data

Authors: Enrico Fini, Alessio Brutti

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[10] arXiv:1911.01533 [pdf, other]: Title: Speaker-invariant Affective Representation Learning via Adversarial Training

Authors: Haoqi Li, Ming Tu, Jing Huang, Shrikanth Narayanan, Panayiotis Georgiou

Comments: Accepted by ICASSP 2020; 5 pages

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[11] arXiv:1911.01601 [pdf, other]: Title: ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

Authors: Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling

Comments: Accepted, Computer Speech and Language. This manuscript version is made available under the CC-BY-NC-ND 4.0. For the published version on Elsevier website, please visit this https URL

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD); Signal Processing (eess.SP)
[12] arXiv:1911.01635 [pdf, other]: Title: Emotional speech synthesis with rich and granularized control

Authors: Se-Yun Um, Sangshin Oh, Kyungguen Byun, Inseon Jang, Chunghyun Ahn, Hong-Goo Kang

Comments: Submitted to ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:1911.01799 [pdf, ps, other]: Title: CN-CELEB: a challenging Chinese speaker recognition dataset

Authors: Yue Fan, Jiawen Kang, Lantian Li, Kaicheng Li, Haolin Chen, Sitong Cheng, Pengyuan Zhang, Ziya Zhou, Yunqi Cai, Dong Wang

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[14] arXiv:1911.01802 [pdf, other]: Title: Fast acoustic scattering using convolutional neural networks

Authors: Ziqi Fan, Vibhav Vineet, Hannes Gamper, Nikunj Raghuvanshi

Comments: Accepted by ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV); Signal Processing (eess.SP)
[15] arXiv:1911.01803 [pdf, other]: Title: Temporal Feedback Convolutional Recurrent Neural Networks for Speech Command Recognition

Authors: Taejun Kim, Juhan Nam

Comments: This paper is accepted to APSIPA ASC 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[16] arXiv:1911.01806 [pdf, other]: Title: Mixture factorized auto-encoder for unsupervised hierarchical deep factorization of speech signal

Authors: Zhiyuan Peng, Siyuan Feng, Tan Lee

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[17] arXiv:1911.01840 [pdf, other]: Title: Who is Real Bob? Adversarial Attacks on Speaker Recognition Systems

Authors: Guangke Chen, Sen Chen, Lingling Fan, Xiaoning Du, Zhe Zhao, Fu Song, Yang Liu

Comments: IEEE Symposium on Security and Privacy 2021

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[18] arXiv:1911.01902 [pdf, ps, other]: Title: Speech Enhancement via Deep Spectrum Image Translation Network

Authors: Hamidreza Baradaran Kashani, Ata Jodeiri, Mohammad Mohsen Goodarzi, Iman Sarraf Rezaei

Comments: Accepted at ICBME 2019

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19] arXiv:1911.02086 [pdf, other]: Title: Small-Footprint Keyword Spotting on Raw Audio Data with Sinc-Convolutions

Authors: Simon Mittermaier, Ludwig Kürzinger, Bernd Waschneck, Gerhard Rigoll

Comments: Accepted at ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[20] arXiv:1911.02091 [pdf, other]: Title: Closing the Training/Inference Gap for Deep Attractor Networks

Authors: Cyril Cadoux, Stefan Uhlich, Marc Ferras, Yuki Mitsufuji

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21] arXiv:1911.02115 [pdf, ps, other]: Title: Spatial Attention for Far-field Speech Recognition with Deep Beamforming Neural Networks

Authors: Weipeng He, Lu Lu, Biqiao Zhang, Jay Mahadeokar, Kaustubh Kalgaonkar, Christian Fuegen

Comments: To be presented at ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:1911.02216 [pdf, ps, other]: Title: Addressing Ambiguity of Emotion Labels Through Meta-Learning

Authors: Takuya Fujioka, Dario Bertero, Takeshi Homma, Kenji Nagamatsu

Comments: Submitted to ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[23] arXiv:1911.02242 [pdf, other]: Title: A comparison of end-to-end models for long-form speech recognition

Authors: Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu

Comments: ASRU camera-ready version

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[24] arXiv:1911.02388 [pdf, other]: Title: The Speed Submission to DIHARD II: Contributions & Lessons Learned

Authors: Md Sahidullah, Jose Patino, Samuele Cornell, Ruiqing Yin, Sunit Sivasankaran, Hervé Bredin, Pavel Korshunov, Alessio Brutti, Romain Serizel, Emmanuel Vincent, Nicholas Evans, Sébastien Marcel, Stefano Squartini, Claude Barras

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[25] arXiv:1911.02746 [pdf, other]: Title: Mask-dependent Phase Estimation for Monaural Speaker Separation

Authors: Zhaoheng Ni, Michael I Mandel

Comments: Accepted by ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS)

[ total of 155 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 151-155 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2404, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Nov 2019