Audio and Speech Processing

Authors and titles for eess.AS in Nov 2020

[ total of 227 entries: 1-227 ]
[ showing 227 entries per page: fewer | more ]

[1] arXiv:2011.00030 [pdf, other]: Title: A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis

Authors: Shanshan Wang, Annamaria Mesaros, Toni Heittola, Tuomas Virtanen

Comments: accepted by ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:2011.00091 [pdf, other]: Title: Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

Authors: Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

Comments: submitted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[3] arXiv:2011.00175 [pdf, other]: Title: Multimodal Urban Sound Tagging with Spatiotemporal Context

Authors: Jisheng Bai, Jianfeng Chen, Mou Wang

Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2011.00316 [pdf, other]: Title: AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization

Authors: Yen-Hao Chen, Da-Yi Wu, Tsung-Han Wu, Hung-yi Lee

Comments: Submitted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[5] arXiv:2011.00502 [pdf, other]: Title: Focusing Phenomena in Linear Discrete Inverse Problems in Acoustics

Authors: Eric C. Hamdan, Filippo Maria Fazi

Comments: 33 pages, 23 figures, submitted for review to the Journal of Sound and Vibration; fixed typos and minor revision in sections 6.1.4-6.1.5 and 6.2

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[6] arXiv:2011.00699 [pdf, other]: Title: Transformer-based Arabic Dialect Identification

Authors: Wanqiu Lin, Maulik Madhavi, Rohan Kumar Das, Haizhou Li

Comments: Accepted for publication in International Conference on Asian Language Processing (IALP) 2020

Subjects: Audio and Speech Processing (eess.AS)
[7] arXiv:2011.00721 [pdf, other]: Title: Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations

Authors: Purvi Agrawal, Sriram Ganapathy

Comments: arXiv admin note: text overlap with arXiv:2001.07067

Journal-ref: Proc. Interspeech 2020, 1649-1653 (2020)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2011.00935 [pdf, other]: Title: FeatherTTS: Robust and Efficient attention based Neural TTS

Authors: Qiao Tian, Zewang Zhang, Chao Liu, Heng Lu, Linghui Chen, Bin Wei, Pujiang He, Shan Liu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2011.01108 [pdf, ps, other]: Title: End-to-end anti-spoofing with RawNet2

Authors: Hemlata Tak, Jose Patino, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans, Anthony Larcher

Comments: Accepted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS)
[10] arXiv:2011.01130 [pdf, other]: Title: Speaker anonymisation using the McAdams coefficient

Authors: Jose Patino, Natalia Tomashenko, Massimiliano Todisco, Andreas Nautsch, Nicholas Evans

Comments: Accepted at INTERSPEECH 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[11] arXiv:2011.01174 [pdf, other]: Title: Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech

Authors: Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim

Comments: 9 pages, 5 figures, 4 tables

Journal-ref: IEEE Access, vol. 10, pp. 52621 - 52629, 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[12] arXiv:2011.01175 [pdf, other]: Title: CAMP: a Two-Stage Approach to Modelling Prosody in Context

Authors: Zack Hodari, Alexis Moinet, Sri Karlapati, Jaime Lorenzo-Trueba, Thomas Merritt, Arnaud Joly, Ammar Abbas, Penny Karanasou, Thomas Drugman

Comments: 5 pages. Published in the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)

Subjects: Audio and Speech Processing (eess.AS)
[13] arXiv:2011.01210 [pdf, other]: Title: Focus on the present: a regularization method for the ASR source-target attention layer

Authors: Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

Comments: submitted to ICASSP2021. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[14] arXiv:2011.01557 [pdf, other]: Title: StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization

Authors: Ahmed Mustafa, Nicola Pia, Guillaume Fuchs

Comments: Accepted to ICASSP2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[15] arXiv:2011.01570 [pdf, other]: Title: Dynamic latency speech recognition with asynchronous revision

Authors: Mingkun Huang, Meng Cai, Jun Zhang, Yang Zhang, Yongbin You, Yi He, Zejun Ma

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2011.01576 [pdf, other]: Title: Improving RNN transducer with normalized jointer network

Authors: Mingkun Huang, Jun Zhang, Meng Cai, Yang Zhang, Jiali Yao, Yongbin You, Yi He, Zejun Ma

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17] arXiv:2011.01678 [pdf, other]: Title: Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

Authors: Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng

Comments: Accepted to Interspeech 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[18] arXiv:2011.01686 [pdf, ps, other]: Title: Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization

Authors: Disong Wang, Jianwei Yu, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng

Comments: To appear in ISCSLP2021

Subjects: Audio and Speech Processing (eess.AS)
[19] arXiv:2011.01691 [pdf, other]: Title: A Study of Incorporating Articulatory Movement Information in Speech Enhancement

Authors: Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Xugang Lu, Yu Tsao

Subjects: Audio and Speech Processing (eess.AS)
[20] arXiv:2011.01965 [pdf, ps, other]: Title: Short-time deep-learning based source separation for speech enhancement in reverberant environments with beamforming

Authors: Alejandro Díaz, Diego Pincheira, Rodrigo Mahu, Nestor Becerra Yoma

Subjects: Audio and Speech Processing (eess.AS)
[21] arXiv:2011.01986 [pdf, other]: Title: Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features

Authors: Man-Ling Sung, Siyuan Feng, Tan Lee

Comments: 8 pages, accepted and presented in APSIPA-APC 2018. This work was done when Man-Ling Sung and Siyuan Feng were postgraduate students in the Chinese University of Hong Kong

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[22] arXiv:2011.01991 [pdf, other]: Title: Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

Authors: Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

Comments: 8 pages, 2 figures, SLT 2021

Journal-ref: 2021 IEEE Spoken Language Technology Workshop (SLT)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[23] arXiv:2011.01997 [pdf, other]: Title: DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Authors: Desh Raj, Leibny Paola Garcia-Perera, Zili Huang, Shinji Watanabe, Daniel Povey, Andreas Stolcke, Sanjeev Khudanpur

Comments: Accepted to IEEE SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24] arXiv:2011.02008 [pdf, other]: Title: Complex ratio masking for singing voice separation

Authors: Yixuan Zhang, Yuzhou Liu, DeLiang Wang

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25] arXiv:2011.02014 [pdf, other]: Title: Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

Authors: Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey

Comments: Accepted to IEEE SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[26] arXiv:2011.02090 [pdf, other]: Title: Frustratingly Easy Noise-aware Training of Acoustic Models

Authors: Desh Raj, Jesus Villalba, Daniel Povey, Sanjeev Khudanpur

Comments: 6 + 3 (Appendix) pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27] arXiv:2011.02102 [pdf, other]: Title: Robust Speaker Extraction Network Based on Iterative Refined Adaptation

Authors: Chengyun Deng, Shiqian Ma, Yi Zhang, Yongtao Sha, Hui Zhang, Hui Song, Xiangang Li

Comments: Accepted by Interspeech 2021

Subjects: Audio and Speech Processing (eess.AS)
[28] arXiv:2011.02109 [pdf, ps, other]: Title: Deep Multi-task Network for Delay Estimation and Echo Cancellation

Authors: Yi Zhang, Chengyun Deng, Shiqian Ma, Yongtao Sha, Hui Song

Comments: Accepted by Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS)
[29] arXiv:2011.02132 [pdf, other]: Title: Multi-Modal Transformers Utterance-Level Code-Switching Detection

Authors: Krishna D N

Comments: 8 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS)
[30] arXiv:2011.02136 [pdf, other]: Title: Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting

Authors: Purvi Agrawal, Sriram Ganapathy

Comments: arXiv admin note: text overlap with arXiv:2011.00721

Journal-ref: IEEE Transactions and Audio, Speech and Language Processing, Vol. 28, pp. 2823 - 2836, 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[31] arXiv:2011.02168 [pdf, other]: Title: Learning in your voice: Non-parallel voice conversion based on speaker consistency loss

Authors: Yoohwan Kwon, Soo-Whan Chung, Hee-Soo Heo, Hong-Goo Kang

Comments: ICASSP 2021 submitted

Subjects: Audio and Speech Processing (eess.AS)
[32] arXiv:2011.02252 [pdf, other]: Title: Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech

Authors: Sri Karlapati, Ammar Abbas, Zack Hodari, Alexis Moinet, Arnaud Joly, Penny Karanasou, Thomas Drugman

Comments: 5 pages and 3 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[33] arXiv:2011.02421 [pdf, other]: Title: One-shot conditional audio filtering of arbitrary sounds

Authors: Beat Gfeller, Dominik Roblek, Marco Tagliasacchi

Subjects: Audio and Speech Processing (eess.AS)
[34] arXiv:2011.02561 [pdf, other]: Title: A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification

Authors: You Wang, Chuyao Feng, David V. Anderson

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[35] arXiv:2011.02619 [pdf, ps, other]: Title: Don't look back: an online beat tracking method using RNN and enhanced particle filtering

Authors: Mojtaba Heydari, Zhiyao Duan

Comments: IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2021). (ACCEPTED)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36] arXiv:2011.02698 [pdf, other]: Title: A Comparison Study on Infant-Parent Voice Diarization

Authors: Junzhe Zhu, Mark Hasegawa-Johnson, Nancy McElwain

Comments: ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37] arXiv:2011.02774 [pdf, other]: Title: Multi-Accent Adaptation based on Gate Mechanism

Authors: Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan

Comments: Accepted in INTERSPEECH 2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[38] arXiv:2011.02782 [pdf, other]: Title: Domain Adaptation Using Class Similarity for Robust Speech Recognition

Authors: Han Zhu, Jiangjiang Zhao, Yuling Ren, Li Wang, Pengyuan Zhang

Comments: Accepted in INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[39] arXiv:2011.02900 [pdf, other]: Title: Multi-class Spectral Clustering with Overlaps for Speaker Diarization

Authors: Desh Raj, Zili Huang, Sanjeev Khudanpur

Comments: Accepted at IEEE SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:2011.02921 [pdf, ps, other]: Title: Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR

Authors: Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka

Comments: Submitted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.10930, arXiv:2008.04546

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[41] arXiv:2011.02949 [pdf, other]: Title: Anomalous Sound Detection as a Simple Binary Classification Problem with Careful Selection of Proxy Outlier Examples

Authors: Paul Primus, Verena Haunschmid, Patrick Praher, Gerhard Widmer

Comments: published in DCASE 2020 Workshop

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[42] arXiv:2011.03110 [pdf, other]: Title: Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription

Authors: Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka

Comments: Accepted to SLT2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:2011.03115 [pdf, ps, other]: Title: A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery

Authors: Bolaji Yusuf, Lucas Ondel, Lukas Burget, Jan Cernocky, Murat Saraclar

Comments: Submitted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[44] arXiv:2011.03118 [pdf, other]: Title: Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages

Authors: Trideba Padhi, Astik Biswas, Febe De Wet, Ewald van der Westhuizen, Thomas Niesler

Comments: In Proceedings of The First Workshop on Speech Technologies for Code-Switching in Multilingual Communities

Journal-ref: http://festvox.org/cedar/WSTCSMC2020.pdf

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[45] arXiv:2011.03426 [src]: Title: Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement

Authors: Aswin Sivaraman, Minje Kim

Comments: This work has been superseded by article 2104.02017

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[46] arXiv:2011.03432 [pdf, ps, other]: Title: Misalignment Recognition in Acoustic Sensor Networks using a Semi-supervised Source Estimation Method and Markov Random Fields

Authors: Gabriel F Miller, Andreas Brendel, Walter Kellermann, Sharon Gannot

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47] arXiv:2011.03706 [pdf, other]: Title: ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

Authors: Chenda Li, Jing Shi, Wangyou Zhang, Aswin Shanmugam Subramanian, Xuankai Chang, Naoyuki Kamo, Moto Hira, Tomoki Hayashi, Christoph Boeddeker, Zhuo Chen, Shinji Watanabe

Comments: Accepted by SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[48] arXiv:2011.03810 [pdf, other]: Title: Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks

Authors: Sneha Das, Tom Bäckström

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[49] arXiv:2011.03943 [pdf, other]: Title: Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement

Authors: Daxin Tan, Tan Lee

Comments: Accepted by Interspeech 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50] arXiv:2011.04084 [pdf, other]: Title: Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

Authors: Shahram Ghorbani, Yashesh Gaur, Yu Shi, Jinyu Li

Comments: Accepted at SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[51] arXiv:2011.04359 [pdf, ps, other]: Title: An Empirical Study of Visual Features for DNN based Audio-Visual Speech Enhancement in Multi-talker Environments

Authors: Shrishti Saha Shetu, Soumitro Chakrabarty, Emanuël A. P. Habets

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[52] arXiv:2011.04456 [pdf, other]: Title: Efficient Training Data Generation for Phase-Based DOA Estimation

Authors: Fabian Hübner, Wolfgang Mack, Emanuël A. P. Habets

Comments: Submitted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53] arXiv:2011.04569 [pdf, other]: Title: Informed Source Extraction With Application to Acoustic Echo Reduction

Authors: Mohamed Elminshawi, Wolfgang Mack, Emanuël A. P. Habets

Comments: Published at ITG 2021

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[54] arXiv:2011.04785 [pdf, ps, other]: Title: Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR

Authors: Xiaohui Zhang, Frank Zhang, Chunxi Liu, Kjell Schubert, Julian Chan, Pradyot Prakash, Jun Liu, Ching-Feng Yeh, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig

Comments: Accepted for publication at IEEE Spoken Language Technology Workshop (SLT), 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[55] arXiv:2011.04896 [pdf, ps, other]: Title: An Empirical Study on Text-Independent Speaker Verification based on the GE2E Method

Authors: Soroosh Tayebi Arasteh

Comments: 6 pages, 7 tables, 2 figures, 4 algorithms. An empirical study on the paper arXiv:1710.10467 by Wan et al. (2017)

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[56] arXiv:2011.05038 [pdf, other]: Title: Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model

Authors: Haoyu Li, Yang Ai, Junichi Yamagishi

Comments: 8 pages. Accepted to IEEE SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2011.05161 [pdf, other]: Title: Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis

Authors: Guanghui Xu, Wei Song, Zhengchen Zhang, Chao Zhang, Xiaodong He, Bowen Zhou

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[58] arXiv:2011.05540 [pdf, other]: Title: Surrogate Source Model Learning for Determined Source Separation

Authors: Robin Scheibler, Masahito Togami

Comments: 5 pages, 3 figures, 1 table. Submitted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[59] arXiv:2011.05649 [pdf, other]: Title: Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients

Authors: Huahuan Zheng, Keyu An, Zhijian Ou

Comments: Accepted by IEEE SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[60] arXiv:2011.05707 [pdf, other]: Title: Low-resource expressive text-to-speech using data augmentation

Authors: Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[61] arXiv:2011.05731 [pdf, other]: Title: FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation

Authors: Songxiang Liu, Yuewen Cao, Na Hu, Dan Su, Helen Meng

Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME) 2021

Subjects: Audio and Speech Processing (eess.AS)
[62] arXiv:2011.05958 [pdf, other]: Title: On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments

Authors: Jisi Zhang, Catalin Zorila, Rama Doddipatla, Jon Barker

Comments: Presented at IEEE ICASSP 2020

Journal-ref: Proc. ICASSP (2020) 6389-6393

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[63] arXiv:2011.06110 [pdf, other]: Title: Efficient Knowledge Distillation for RNN-Transducer Models

Authors: Sankaran Panchapagesan, Daniel S. Park, Chung-Cheng Chiu, Yuan Shangguan, Qiao Liang, Alexander Gruenstein

Comments: 5 pages, 1 figure, 2 tables; submitted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64] arXiv:2011.06239 [pdf, other]: Title: The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge

Authors: Si-Ioi Ng, Wei Liu, Zhiyuan Peng, Siyuan Feng, Hing-Pang Huang, Odette Scharenborg, Tan Lee

Comments: Submitted to 2021 SLT Children Speech Recognition Challenge (CSRC)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[65] arXiv:2011.06465 [pdf, other]: Title: Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

Authors: Chung-Ming Chien, Hung-yi Lee

Comments: Accepted by SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[66] arXiv:2011.06548 [pdf, other]: Title: Evaluating the Intelligibility Benefits of Neural Speech Enrichment for Listeners with Normal Hearing and Hearing Impairment using the Greek Harvard Corpus

Authors: Muhammed PV Shifas, Anna Sfakianaki, Theognosia Chimona, Yannis Stylianou

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[67] arXiv:2011.06739 [pdf, other]: Title: Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables

Authors: Nadee Seneviratne, Carol Espy-Wilson

Comments: 5 pages, Submitted to Interspeech 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[68] arXiv:2011.07065 [pdf, other]: Title: Multi-Modal Emotion Detection with Transfer Learning

Authors: Amith Ananthram, Kailash Karthik Saravanakumar, Jessica Huynh, Homayoon Beigi

Comments: 11 pages, 7 tables, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[69] arXiv:2011.07274 [pdf, other]: Title: On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks

Authors: Serkan Sulun, Matthew E. P. Davies

Comments: Qualitative examples on this https URL Source code on this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[70] arXiv:2011.07338 [pdf, other]: Title: Distortion-controlled Training for End-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss

Authors: Yi Luo, Cong Han, Nima Mesgarani

Comments: SLT 2021

Subjects: Audio and Speech Processing (eess.AS)
[71] arXiv:2011.07545 [pdf, ps, other]: Title: Automatic dysarthric speech detection exploiting pairwise distance-based convolutional neural networks

Authors: P. Janbakhshi, I. Kodrasi, H. Bourlard

Comments: accepted at ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS)
[72] arXiv:2011.07547 [pdf, other]: Title: Multi-task single channel speech enhancement using speech presence probability as a secondary task training target

Authors: L. Wang, J. Zhu, I. Kodrasi

Comments: EUSIPCO 2021

Subjects: Audio and Speech Processing (eess.AS)
[73] arXiv:2011.07755 [pdf, other]: Title: Audio-visual Multi-channel Integration and Recognition of Overlapped Speech

Authors: Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu

Comments: TASLP 2021

Subjects: Audio and Speech Processing (eess.AS)
[74] arXiv:2011.07791 [pdf, other]: Title: Block-Online Guided Source Separation

Authors: Shota Horiguchi, Yusuke Fujita, Kenji Nagamatsu

Comments: Accepted to SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[75] arXiv:2011.07859 [pdf, other]: Title: A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network

Authors: Thi Ngoc Tho Nguyen, Ngoc Khanh Nguyen, Huy Phan, Lam Pham, Kenneth Ooi, Douglas L. Jones, Woon-Seng Gan

Subjects: Audio and Speech Processing (eess.AS)
[76] arXiv:2011.08346 [pdf, other]: Title: Refining Automatic Speech Recognition System for older adults

Authors: Liu Chen, Meysam Asgari

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[77] arXiv:2011.08397 [pdf, other]: Title: Ultra-Lightweight Speech Separation via Group Communication

Authors: Yi Luo, Cong Han, Nima Mesgarani

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[78] arXiv:2011.08400 [pdf, other]: Title: Rethinking the Separation Layers in Speech Separation Networks

Authors: Yi Luo, Zhuo Chen, Cong Han, Chenda Li, Tianyan Zhou, Nima Mesgarani

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79] arXiv:2011.08401 [pdf, other]: Title: Implicit Filter-and-sum Network for Multi-channel Speech Separation

Authors: Yi Luo, Nima Mesgarani

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[80] arXiv:2011.08480 [pdf, other]: Title: s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis

Authors: Xi Wang, Huaiping Ming, Lei He, Frank K. Soong

Comments: 5 pages, 5 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[81] arXiv:2011.09044 [pdf, other]: Title: Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding

Authors: Bhuvan Agrawal, Markus Müller, Martin Radfar, Samridhi Choudhary, Athanasios Mouchtaris, Siegfried Kunzmann

Comments: 7 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[82] arXiv:2011.09162 [pdf, other]: Title: WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation

Authors: Zhaoheng Ni, Yong Xu, Meng Yu, Bo Wu, Shixiong Zhang, Dong Yu, Michael I Mandel

Comments: accepted by SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[83] arXiv:2011.09270 [pdf, other]: Title: Respiratory Distress Detection from Telephone Speech using Acoustic and Prosodic Features

Authors: Meemnur Rashid, Kaisar Ahmed Alman, Khaled Hasan, John H.L. Hansen, Taufiq Hasan

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[84] arXiv:2011.09624 [pdf, other]: Title: Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

Authors: Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Comments: Accepted in ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[85] arXiv:2011.09631 [pdf, other]: Title: Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains

Authors: Won Jang, Dan Lim, Jaesam Yoon

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[86] arXiv:2011.09804 [pdf, other]: Title: TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos

Authors: Manuel Sam Ribeiro, Jennifer Sanger, Jing-Xuan Zhang, Aciel Eshky, Alan Wrench, Korin Richmond, Steve Renals

Comments: 8 pages, 4 figures, Accepted to SLT2021, IEEE Spoken Language Technology Workshop

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV)
[87] arXiv:2011.10345 [pdf, other]: Title: Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement

Authors: Marvin Tammen, Simon Doclo

Comments: submitted to the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Ontario, Canada

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[88] arXiv:2011.10527 [pdf, other]: Title: Multi-Scale Speaker Diarization With Neural Affinity Score Fusion

Authors: Tae Jin Park, Manoj Kumar, Shrikanth Narayanan

Comments: Submitted to ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS)
[89] arXiv:2011.10538 [pdf, other]: Title: Improving RNN-T ASR Accuracy Using Context Audio

Authors: Andreas Schwarz, Ilya Sklyar, Simon Wiesler

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[90] arXiv:2011.10706 [pdf, other]: Title: Speech Denoising with Auditory Models

Authors: Mark R. Saddler, Andrew Francl, Jenelle Feather, Kaizhi Qian, Yang Zhang, Josh H. McDermott

Comments: First two authors contributed equally, 5 pages, 3 PDF figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[91] arXiv:2011.10798 [pdf, other]: Title: A Better and Faster End-to-End Model for Streaming ASR

Authors: Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

Comments: Accepted in ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[92] arXiv:2011.11315 [pdf, other]: Title: End-to-end Silent Speech Recognition with Acoustic Sensing

Authors: Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, Jing Xiao

Comments: will be presented in SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[93] arXiv:2011.11564 [pdf, other]: Title: Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems

Authors: Xianrui Zheng, Yulan Liu, Deniz Gunceler, Daniel Willett (Amazon Alexa)

Comments: To appear in Proc. ICASSP2021, June 06-11, 2021, Toronto, Ontario, Canada

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94] arXiv:2011.11671 [pdf, other]: Title: Streaming Multi-speaker ASR with RNN-T

Authors: Ilya Sklyar, Anna Piunova, Yulan Liu

Comments: Accepted at ICASSP2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[95] arXiv:2011.11818 [pdf, other]: Title: Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech

Authors: Yiling Huang, Yutian Chen, Jason Pelecanos, Quan Wang

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[96] arXiv:2011.11984 [pdf, other]: Title: Integration of variational autoencoder and spatial clustering for adaptive multi-channel neural speech separation

Authors: Katerina Zmolikova, Marc Delcroix, Lukáš Burget, Tomohiro Nakatani, Jan "Honza" Černocký

Comments: 8 pages, 3 figures, to be published in SLT2021

Subjects: Audio and Speech Processing (eess.AS)
[97] arXiv:2011.12063 [pdf, other]: Title: How Far Are We from Robust Voice Conversion: A Survey

Authors: Tzu-hsien Huang, Jheng-hao Lin, Chien-yu Huang, Hung-yi Lee

Comments: Accepted by SLT 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[98] arXiv:2011.12133 [pdf, other]: Title: Zero-Shot Audio Classification via Semantic Embeddings

Authors: Huang Xie, Tuomas Virtanen

Comments: Submitted to Transactions on Audio, Speech and Language Processing

Subjects: Audio and Speech Processing (eess.AS)
[99] arXiv:2011.12206 [pdf, other]: Title: TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis

Authors: Qiao Tian, Yi Chen, Zewang Zhang, Heng Lu, Linghui Chen, Lei Xie, Shan Liu

Subjects: Audio and Speech Processing (eess.AS)
[100] arXiv:2011.12221 [pdf, ps, other]: Title: A light transformer for speech-to-intent applications

Authors: Pu Wang, Hugo Van hamme

Comments: To be published in SLT 2021

Subjects: Audio and Speech Processing (eess.AS)
[101] arXiv:2011.12564 [pdf, ps, other]: Title: Soft-Median Choice: An Automatic Feature Smoothing Method for Sound Event Detection

Authors: Fengnian Zhao, Ruwei Li, Xin Liu, Liwen Xu

Comments: 5 pages, 3 figures, 6 tables

Subjects: Audio and Speech Processing (eess.AS)
[102] arXiv:2011.12657 [pdf, other]: Title: Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections

Authors: Huang Xie, Okko Räsänen, Tuomas Virtanen

Comments: Accepted by ICASSP 2021

Subjects: Audio and Speech Processing (eess.AS)
[103] arXiv:2011.12696 [pdf, other]: Title: Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio

Authors: Manuel Giollo, Deniz Gunceler, Yulan Liu, Daniel Willett

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[104] arXiv:2011.12941 [pdf, other]: Title: Small Footprint Convolutional Recurrent Networks for Streaming Wakeword Detection

Authors: Mohammad Omar Khursheed, Christin Jose, Rajath Kumar, Gengshen Fu, Brian Kulis, Santosh Kumar Cheekatmalla

Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Audio and Speech Processing (eess.AS)
[105] arXiv:2011.12998 [pdf, other]: Title: VoxLingua107: a Dataset for Spoken Language Recognition

Authors: Jörgen Valk, Tanel Alumäe

Comments: Accepted at IEEE Spoken Language Technology Workshop (SLT) 2021

Subjects: Audio and Speech Processing (eess.AS)
[106] arXiv:2011.13090 [pdf, other]: Title: Multi-QuartzNet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion

Authors: Jian Luo, Jianzong Wang, Ning Cheng, Guilin Jiang, Jing Xiao

Comments: will be presented in SLT 2021

Subjects: Audio and Speech Processing (eess.AS)
[107] arXiv:2011.13834 [pdf, other]: Title: Transformer-based Online Speech Recognition with Decoder-end Adaptive Computation Steps

Authors: Mohan Li, Catalin Zorila, Rama Doddipatla

Comments: 7 pages, 1 figure, accepted at SLT 2021

Subjects: Audio and Speech Processing (eess.AS)
[108] arXiv:2011.14060 [pdf, other]: Title: Unsupervised Spoken Term Discovery on Untranscribed Speech

Authors: Man-Ling Sung

Comments: Thesis submitted in September 2019 for the M.Phil degree in Electronic Engineering at The Chinese University of Hong Kong (CUHK)

Subjects: Audio and Speech Processing (eess.AS)
[109] arXiv:2011.14062 [pdf, other]: Title: Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks

Authors: Man-Ling Sung, Tan Lee

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[110] arXiv:2011.02195 (cross-list from eess.SP) [pdf, other]: Title: Correlation based Multi-phasal models for improved imagined speech EEG recognition

Authors: Rini A Sharon, Hema A Murthy

Journal-ref: Interspeech SMM 2020

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111] arXiv:2011.08848 (cross-list from eess.SP) [pdf, ps, other]: Title: Deep Networks for Direction-of-Arrival Estimation in Low SNR

Authors: Georgios K. Papageorgiou, Mathini Sellathurai, Yonina C. Eldar

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2011.00196 (cross-list from cs.SD) [pdf, other]: Title: RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting

Authors: Siddhartha Gairola, Francis Tom, Nipun Kwatra, Mohit Jain

Comments: Code visible at this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[113] arXiv:2011.00200 (cross-list from cs.SD) [pdf, other]: Title: The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020

Authors: Xu Xiang

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[114] arXiv:2011.00695 (cross-list from cs.SD) [pdf, other]: Title: Learning generic feature representation with synthetic data for weakly-supervised sound event detection by inter-frame distance loss

Authors: Yuxin Huang, Liwei Lin, Xiangdong Wang, Hong Liu, Yueliang Qian, Min Liu, Kazushige Ouchi

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2011.00747 (cross-list from cs.CL) [pdf, other]: Title: Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation

Authors: Hang Le, Juan Pino, Changhan Wang, Jiatao Gu, Didier Schwab, Laurent Besacier

Comments: Accepted at COLING 2020 (Oral)

Journal-ref: The 28th International Conference on Computational Linguistics (COLING 2020)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2011.00771 (cross-list from cs.LG) [pdf, ps, other]: Title: Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition

Authors: Jae-Jin Jeon, Eesung Kim

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2011.00773 (cross-list from cs.SD) [pdf, other]: Title: Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music

Authors: Ashish Ranjan, Varun Nagesh Jolly Behera, Motahar Reza

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[118] arXiv:2011.00782 (cross-list from cs.SD) [pdf, other]: Title: CVC: Contrastive Learning for Non-parallel Voice Conversion

Authors: Tingle Li, Yichen Liu, Chenxu Hu, Hang Zhao

Comments: Submitted Interspeech 2021, Project Page: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2011.00801 (cross-list from cs.SD) [pdf, other]: Title: Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes

Authors: Nicolas Turpault (MULTISPEECH), Romain Serizel (MULTISPEECH), Scott Wisdom, Hakan Erdogan, John Hershey, Eduardo Fonseca, Prem Seetharaman, Justin Salamon

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2011.00803 (cross-list from cs.SD) [pdf, other]: Title: What's All the FUSS About Free Universal Sound Separation Data?

Authors: Scott Wisdom, Hakan Erdogan, Daniel Ellis, Romain Serizel (MULTISPEECH), Nicolas Turpault (MULTISPEECH), Eduardo Fonseca, Justin Salamon, Prem Seetharaman, John Hershey

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121] arXiv:2011.01143 (cross-list from cs.SD) [pdf, other]: Title: Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds

Authors: Efthymios Tzinis, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

Comments: ICLR 2021, 27 pages

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[122] arXiv:2011.01151 (cross-list from cs.SD) [pdf, other]: Title: Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End Metric

Authors: Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel

Comments: Accepted at ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[123] arXiv:2011.01447 (cross-list from cs.SD) [pdf, other]: Title: A Two-Stage Approach to Device-Robust Acoustic Scene Classification

Authors: Hu Hu, Chao-Han Huck Yang, Xianjun Xia, Xue Bai, Xin Tang, Yajian Wang, Shutong Niu, Li Chai, Juanjuan Li, Hongning Zhu, Feng Bao, Yuanjun Zhao, Sabato Marco Siniscalchi, Yannan Wang, Jun Du, Chin-Hui Lee

Comments: Submitted to ICASSP 2021. Code available: this https URL

Journal-ref: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[124] arXiv:2011.01460 (cross-list from cs.LG) [pdf, other]: Title: Training Wake Word Detection with Synthesized Speech Data on Confusion Words

Authors: Yan Jia, Zexin Cai, Murong Ma, Zeqing Zhao, Xuyang Wang, Junjie Wang, Ming Li

Comments: Submitted to ICASSP 2021

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2011.01518 (cross-list from cs.SD) [pdf, other]: Title: ShaneRun System Description to VoxCeleb Speaker Recognition Challenge 2020

Authors: Shen Chen

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[126] arXiv:2011.01561 (cross-list from cs.SD) [pdf, other]: Title: Two Heads Are Better Than One: A Two-Stage Approach for Monaural Noise Reduction in the Complex Domain

Authors: Andong Li, Chengshi Zheng, Renhua Peng, Xiaodong Li

Comments: Submitted to ICASSP 2021, 5 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2011.01709 (cross-list from cs.SD) [pdf, other]: Title: Small footprint Text-Independent Speaker Verification for Embedded Systems

Authors: Julien Balian, Raffaele Tavarone, Mathieu Poumeyrol, Alice Coucke

Journal-ref: Acoustics, Speech and Signal Processing (ICASSP), 2021 IEEE International Conference

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[128] arXiv:2011.01761 (cross-list from cs.LG) [pdf, other]: Title: Problems using deep generative models for probabilistic audio source separation

Authors: Maurice Frank, Maximilian Ilse

Journal-ref: 1st I Can't Believe It's Not Better Workshop (ICBINB @ NeurIPS 2020), Vancouver, Canada

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2011.02099 (cross-list from cs.CL) [pdf, other]: Title: Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework

Authors: Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Comments: Accepted at INTERSPEECH 2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130] arXiv:2011.02110 (cross-list from cs.SD) [pdf, other]: Title: Can We Trust Deep Speech Prior?

Authors: Ying Shi, Haolin Chen, Zhiyuan Tang, Lantian Li, Dong Wang, Jiqing Han

Comments: To be published in IEEE SLT 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131] arXiv:2011.02126 (cross-list from cs.CL) [pdf, other]: Title: Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time

Authors: Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura

Comments: Accepted in INTERSPEECH 2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132] arXiv:2011.02127 (cross-list from cs.CL) [pdf, other]: Title: Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition

Authors: Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Comments: Accepted in INTERSPEECH 2019

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133] arXiv:2011.02128 (cross-list from cs.CL) [pdf, other]: Title: Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis

Authors: Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura

Comments: Accepted in SLTU-CCURL 2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134] arXiv:2011.02131 (cross-list from cs.SD) [pdf, other]: Title: DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation

Authors: Yihui Fu, Jian Wu, Yanxin Hu, Mengtao Xing, Lei Xie

Comments: Accepted at IEEE SLT 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2011.02160 (cross-list from cs.CL) [pdf, other]: Title: Data Augmentation for End-to-end Code-switching Speech Recognition

Authors: Chenpeng Du, Hao Li, Yizhou Lu, Lan Wang, Yanmin Qian

Comments: Accepted by SLT2021

Journal-ref: 2021 IEEE Spoken Language Technology Workshop (SLT)

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[136] arXiv:2011.02198 (cross-list from cs.SD) [pdf, other]: Title: IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines

Authors: Yihui Fu, Zhuoyuan Yao, Weipeng He, Jian Wu, Xiong Wang, Zhanheng Yang, Shimin Zhang, Lei Xie, Dongyan Huang, Hui Bu, Petr Motlicek, Jean-Marc Odobez

Comments: Accepted at IEEE SLT 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2011.02314 (cross-list from cs.SD) [pdf, other]: Title: VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

Authors: Kun Zhou, Berrak Sisman, Haizhou Li

Comments: Accepted by IEEE SLT 2021. arXiv admin note: text overlap with arXiv:2005.07025

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[138] arXiv:2011.02329 (cross-list from cs.SD) [pdf, other]: Title: Single channel voice separation for unknown number of speakers under reverberant and noisy settings

Authors: Shlomo E. Chazan, Lior Wolf, Eliya Nachmani, Yossi Adi

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[139] arXiv:2011.02678 (cross-list from cs.SD) [pdf, other]: Title: BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers

Authors: Eunjung Han, Chul Lee, Andreas Stolcke

Journal-ref: Proc. IEEE ICASSP, June 2021, pp. 7193-7197

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[140] arXiv:2011.02809 (cross-list from cs.SD) [pdf, other]: Title: Semi-supervised Learning for Singing Synthesis Timbre

Authors: Jordi Bonada, Merlijn Blaauw

Comments: 5 pages, 1 figure, submitted to ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[141] arXiv:2011.02874 (cross-list from cs.SD) [pdf, ps, other]: Title: Influence of Event Duration on Automatic Wheeze Classification

Authors: Bruno M. Rocha, Diogo Pessoa, Alda Marques, Paulo Carvalho, Rui Pedro Paiva

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[142] arXiv:2011.02882 (cross-list from cs.SD) [pdf, ps, other]: Title: Query Expansion System for the VoxCeleb Speaker Recognition Challenge 2020

Authors: Yu-Sen Cheng, Chun-Liang Shih, Tien-Hong Lo, Wen-Ting Tseng, Berlin Chen

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[143] arXiv:2011.03028 (cross-list from cs.SD) [pdf, other]: Title: From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music

Authors: Patrick Gray, Razvan Bunescu

Comments: Paper submitted for publication in August 2018

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[144] arXiv:2011.03072 (cross-list from cs.CL) [pdf, other]: Title: Alignment Restricted Streaming Recurrent Neural Network Transducer

Authors: Jay Mahadeokar, Yuan Shangguan, Duc Le, Gil Keren, Hang Su, Thong Le, Ching-Feng Yeh, Christian Fuegen, Michael L. Seltzer

Comments: Accepted for presentation at IEEE Spoken Language Technology Workshop (SLT) 2021

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2011.03109 (cross-list from cs.CL) [pdf, other]: Title: Improving RNN Transducer Based ASR with Auxiliary Tasks

Authors: Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig

Comments: Accepted for publication at IEEE Spoken Language Technology Workshop (SLT), 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146] arXiv:2011.03414 (cross-list from cs.SD) [pdf, ps, other]: Title: Robust ENF Estimation Based on Harmonic Enhancement and Maximum Weight Clique

Authors: Guang Hua, Han Liao, Haijian Zhang, Dengpan Ye, Jiayi Ma

Journal-ref: IEEE Transactions on Information Forensics and Security, 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[147] arXiv:2011.03530 (cross-list from cs.CV) [pdf, other]: Title: Large-scale multilingual audio visual dubbing

Authors: Yi Yang, Brendan Shillingford, Yannis Assael, Miaosen Wang, Wendi Liu, Yutian Chen, Yu Zhang, Eren Sezener, Luis C. Cobo, Misha Denil, Yusuf Aytar, Nando de Freitas

Comments: 26 pages, 8 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2011.03568 (cross-list from cs.CL) [pdf, other]: Title: Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

Authors: Ron J. Weiss, RJ Skerry-Ryan, Eric Battenberg, Soroosh Mariooryad, Diederik P. Kingma

Comments: 6 pages including supplement, 3 figures. accepted to ICASSP 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2011.03682 (cross-list from cs.SD) [pdf, other]: Title: Non-local convolutional neural networks (nlcnn) for speaker recognition

Authors: Haici Yang, Hongda Mao, Ruirui Li, Chelsea J.T. Ju, Oguz Elibol

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150] arXiv:2011.03689 (cross-list from cs.SD) [pdf, other]: Title: Detection and Evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems

Authors: Yang Gao, Jiachen Lian, Bhiksha Raj, Rita Singh

Comments: 6 pages excluding references. Paper accepted by IEEE Spoken Language Technology (SLT) 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[151] arXiv:2011.03840 (cross-list from cs.SD) [pdf, other]: Title: Dual Application of Speech Enhancement for Automatic Speech Recognition

Authors: Ashutosh Pandey, Chunxi Liu, Yun Wang, Yatharth Saraf

Comments: Accepted for publication in SLT 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2011.03955 (cross-list from cs.SD) [pdf, other]: Title: Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation

Authors: Yang Ai, Haoyu Li, Xin Wang, Junichi Yamagishi, Zhenhua Ling

Comments: Accepted by SLT 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153] arXiv:2011.04004 (cross-list from cs.CL) [pdf, other]: Title: Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models

Authors: Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2011.04092 (cross-list from cs.SD) [pdf, other]: Title: Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain

Authors: Koen Oostermeijer, Qing Wang, Jun Du

Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[155] arXiv:2011.04249 (cross-list from cs.SD) [pdf, other]: Title: Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

Authors: Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[156] arXiv:2011.04292 (cross-list from cs.SD) [pdf, ps, other]: Title: STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model

Authors: Ryandhimas E. Zezario, Szu-Wei Fu, Chiou-Shann Fuh, Yu Tsao, Hsin-Min Wang

Comments: Accepted in APSIPA 2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[157] arXiv:2011.04297 (cross-list from cs.SD) [pdf, other]: Title: Knowledge Distillation for Singing Voice Detection

Authors: Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, Partha Pratim Das

Comments: Accepted at INTERSPEECH 2021. 5 pages, 3 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[158] arXiv:2011.04299 (cross-list from cs.SD) [pdf, other]: Title: COVID-19 Patient Detection from Telephone Quality Speech Data

Authors: Kotra Venkata Sai Ritwik, Shareef Babu Kalluri, Deepu Vijayasenan

Comments: 6 pages, 7 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159] arXiv:2011.04491 (cross-list from cs.SD) [pdf, other]: Title: Masked Proxy Loss For Text-Independent Speaker Verification

Authors: Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh

Comments: Accepted at Interspeech 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[160] arXiv:2011.04547 (cross-list from cs.SD) [pdf, ps, other]: Title: Data Augmentation For Children's Speech Recognition -- The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge

Authors: Guoguo Chen, Xingyu Na, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Sifan Ma, Yujun Wang

Comments: System description of the SLT 2021 Children Speech Recognition Challenge

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161] arXiv:2011.04568 (cross-list from cs.SD) [pdf, ps, other]: Title: Musical analysis of Stravinski's "The Rite of Spring" based on computational methods

Authors: Germán Ruiz-Marcos

Comments: Audio and Music Processing Lab, 2017

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162] arXiv:2011.04609 (cross-list from cs.SD) [pdf, other]: Title: FRILL: A Non-Semantic Speech Embedding for Mobile Devices

Authors: Jacob Peplinski, Joel Shor, Sachin Joglekar, Jake Garrison, Shwetak Patel

Comments: Accepted to Interspeech 2021

Journal-ref: Proc. Interspeech 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[163] arXiv:2011.04696 (cross-list from cs.SD) [pdf, other]: Title: Speaker De-identification System using Autoencoders and Adversarial Training

Authors: Fernando M. Espinoza-Cuadros, Juan M. Perero-Codosero, Javier Antón-Martín, Luis A. Hernández-Gómez

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[164] arXiv:2011.04906 (cross-list from cs.CL) [pdf, other]: Title: On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers

Authors: Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals

Comments: arXiv admin note: substantial text overlap with arXiv:2005.13895

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2011.04974 (cross-list from cs.SD) [pdf, other]: Title: Deconstruct and Reconstruct Dizi Music of the Northern School and the Southern School

Authors: Yifan Xie, Rongfeng Li

Comments: Best Student Paper in The 8th Conference on Sound and Music Technology (CSMT)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2011.05158 (cross-list from cs.SD) [pdf, other]: Title: GANterpretations

Authors: Pablo Samuel Castro

Comments: In 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167] arXiv:2011.05189 (cross-list from cs.SD) [pdf, other]: Title: Supervised attention for speaker recognition

Authors: Seong Min Kye, Joon Son Chung, Hoirin Kim

Comments: SLT 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2011.05463 (cross-list from cs.CL) [pdf, other]: Title: Deep Sound Change: Deep and Iterative Learning, Convolutional Neural Networks, and Language Change

Authors: Gašper Beguš

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2011.05585 (cross-list from cs.LG) [pdf, other]: Title: Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning

Authors: Jonathan Boigne, Biman Liyanage, Ted Östrem

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2011.05591 (cross-list from cs.SD) [pdf, other]: Title: Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Authors: Cunhang Fan, Bin Liu, Jianhua Tao, Jiangyan Yi, Zhengqi Wen, Leichao Song

Comments: Accepted by ISCSLP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[171] arXiv:2011.06380 (cross-list from cs.SD) [pdf, other]: Title: Automatic Neural Lyrics and Melody Composition

Authors: Gurunath Reddy Madhumani, Yi Yu, Florian Harscoët, Simon Canales, Suhua Tang

Comments: 15 pages

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[172] arXiv:2011.06392 (cross-list from cs.SD) [pdf, other]: Title: Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement

Authors: Hamed Hemati, Damian Borth

Comments: Preprint

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[173] arXiv:2011.06724 (cross-list from cs.SD) [pdf, other]: Title: The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines

Authors: Fan Yu, Zhuoyuan Yao, Xiong Wang, Keyu An, Lei Xie, Zhijian Ou, Bo Liu, Xiulin Li, Guanqiong Miao

Comments: 7 pages, 3 figures, 3 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2011.06801 (cross-list from cs.SD) [pdf, other]: Title: A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions

Authors: Shulei Ji, Jing Luo, Xinyu Yang

Comments: 96 pages,this is a draft

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[175] arXiv:2011.06846 (cross-list from cs.LG) [pdf, other]: Title: Low-activity supervised convolutional spiking neural networks applied to speech commands recognition

Authors: Thomas Pellegrini, Romain Zimmer, Timothée Masquelier

Comments: Accepted to IEEE Spoken Language Technology Workshop 2021

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176] arXiv:2011.07348 (cross-list from cs.SD) [pdf, other]: Title: Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays

Authors: Jonah Casebeer, Jamshed Kaikaus, Paris Smaragdis

Comments: 5 pages, 4 figures, ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2011.07430 (cross-list from cs.CV) [pdf, other]: Title: Audio-Visual Event Recognition through the lens of Adversary

Authors: Juncheng B Li, Kaixin Ma, Shuhui Qu, Po-Yao Huang, Florian Metze

Comments: 4 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2011.07442 (cross-list from cs.SD) [pdf, other]: Title: Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information

Authors: Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

Comments: To appear in IEEE Transactions on Audio, Speech and Language Processing (TASLP)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[179] arXiv:2011.07542 (cross-list from cs.SD) [pdf, other]: Title: Automatic and perceptual discrimination between dysarthria, apraxia of speech, and neurotypical speech

Authors: I. Kodrasi, M. Pernon, M. Laganaro, H. Bourlard

Comments: ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[180] arXiv:2011.07546 (cross-list from cs.SD) [pdf, other]: Title: Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment

Authors: Ruchit Agrawal, Simon Dixon

Comments: Accepted at EUSIPCO 2020

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[181] arXiv:2011.07616 (cross-list from cs.SD) [pdf, other]: Title: Unsupervised Contrastive Learning of Sound Event Representations

Authors: Eduardo Fonseca, Diego Ortego, Kevin McGuinness, Noel E. O'Connor, Xavier Serra

Comments: A 4-page version is submitted to ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[182] arXiv:2011.07754 (cross-list from cs.CL) [pdf, other]: Title: Deep Shallow Fusion for RNN-T Personalization

Authors: Duc Le, Gil Keren, Julian Chan, Jay Mahadeokar, Christian Fuegen, Michael L. Seltzer

Comments: To appear at SLT 2021

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[183] arXiv:2011.07953 (cross-list from cs.CV) [pdf, other]: Title: Shimon the Robot Film Composer and DeepScore: An LSTM for Generation of Film Scores based on Visual Analysis

Authors: Richard Savery, Gil Weinberg

Comments: Computer Simulation of Musical Creativity, 20th-22nd August, University College Dublin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2011.08238 (cross-list from cs.CL) [pdf, ps, other]: Title: End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features

Authors: Edmilson Morais, Hong-Kwang J. Kuo, Samuel Thomas, Zoltan Tuske, Brian Kingsbury

Comments: 5 pages, 3 tables and 1 figure

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2011.08467 (cross-list from cs.SD) [pdf, other]: Title: Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher

Authors: Heyang Xue, Shan Yang, Yi Lei, Lei Xie, Xiulin Li

Comments: 8 pages, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2011.08469 (cross-list from cs.SD) [pdf, other]: Title: Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter

Authors: Xiong Wang, Zhuoyuan Yao, Xian Shi, Lei Xie

Comments: 7 pages, 3 figures, 5 tables

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[187] arXiv:2011.08477 (cross-list from cs.SD) [pdf, other]: Title: Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis

Authors: Yi Lei, Shan Yang, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2011.08483 (cross-list from cs.SD) [pdf, other]: Title: FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances

Authors: Ali Shahin Shamsabadi, Francisco Sepúlveda Teixeira, Alberto Abad, Bhiksha Raj, Andrea Cavallaro, Isabel Trancoso

Comments: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[189] arXiv:2011.08548 (cross-list from cs.SD) [pdf, other]: Title: Optimizing voice conversion network with cycle consistency loss of speaker identity

Authors: Hongqiang Du, Xiaohai Tian, Lei Xie, Haizhou Li

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[190] arXiv:2011.08609 (cross-list from cs.SD) [pdf, other]: Title: Accent and Speaker Disentanglement in Many-to-many Voice Conversion

Authors: Zhichao Wang, Wenshuo Ge, Xiong Wang, Shan Yang, Wendong Gan, Haitao Chen, Hai Li, Lei Xie, Xiulin Li

Comments: Accepted to ISCSLP2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2011.08623 (cross-list from cs.SD) [pdf, other]: Title: Adversarial Training for Multi-domain Speaker Recognition

Authors: Qing Wang, Wei Rao, Pengcheng Guo, Lei Xie

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2011.08679 (cross-list from cs.SD) [pdf, other]: Title: Controllable Emotion Transfer For End-to-End Speech Synthesis

Authors: Tao Li, Shan Yang, Liumeng Xue, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2011.09078 (cross-list from cs.SD) [pdf, other]: Title: Vertical-Horizontal Structured Attention for Generating Music with Chords

Authors: Yizhou Zhao, Liang Qiu, Wensi Ai, Feng Shi, Song-Chun Zhu

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[194] arXiv:2011.09081 (cross-list from cs.SD) [pdf, other]: Title: Multi-Channel Automatic Speech Recognition Using Deep Complex Unet

Authors: Yuxiang Kong, Jian Wu, Quandong Wang, Peng Gao, Weiji Zhuang, Yujun Wang, Lei Xie

Comments: 7 pages, 4 figures, IEEE SLT 2021 Technical Committee

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2011.09143 (cross-list from cs.SD) [pdf, other]: Title: Expanding Access to Music Technology -- Rapid Prototyping Accessible Instrument Solutions For Musicians With Intellectual Disabilities

Authors: Quinn Jarvis-Holland, Crystal Cortez, Nathan (Station) Gamill, Francisco Botello

Comments: Proceedings of the International Conference on New Interfaces for Musical Expression, 2020

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[196] arXiv:2011.09272 (cross-list from cs.CL) [pdf, other]: Title: Combining Prosodic, Voice Quality and Lexical Features to Automatically Detect Alzheimer's Disease

Authors: Mireia Farrús, Joan Codina-Filbà

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2011.09299 (cross-list from cs.SD) [pdf, other]: Title: CAA-Net: Conditional Atrous CNNs with Attention for Explainable Device-robust Acoustic Scene Classification

Authors: Zhao Ren, Qiuqiang Kong, Jing Han, Mark D. Plumbley, Björn W. Schuller

Comments: IEEE Transactions on Multimedia

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[198] arXiv:2011.09301 (cross-list from cs.SD) [pdf, other]: Title: Context-aware RNNLM Rescoring for Conversational Speech Recognition

Authors: Kun Wei, Pengcheng Guo, Hang Lv, Zhen Tu, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[199] arXiv:2011.09744 (cross-list from cs.LG) [pdf, other]: Title: End-To-End Dilated Variational Autoencoder with Bottleneck Discriminative Loss for Sound Morphing -- A Preliminary Study

Authors: Matteo Lionello, Hendrik Purwins

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200] arXiv:2011.09767 (cross-list from cs.SD) [pdf, other]: Title: Deep Residual Local Feature Learning for Speech Emotion Recognition

Authors: Sattaya Singkul, Thakorn Chatchaisathaporn, Boontawee Suntisrivaraporn, Kuntpong Woraratpanya

Comments: 12 pages, 5 figures, submitted for review

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[201] arXiv:2011.10233 (cross-list from cs.SD) [pdf, other]: Title: One Shot Learning for Speech Separation

Authors: Yuan-Kuei Wu, Kuan-Po Huang, Yu Tsao, Hung-yi Lee

Comments: Accepted to ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[202] arXiv:2011.10469 (cross-list from cs.LG) [pdf, other]: Title: Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder

Authors: Sam Davis, Giuseppe Coccia, Sam Gooch, Julian Mack

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203] arXiv:2011.10710 (cross-list from cs.SD) [pdf, other]: Title: Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification

Authors: Xiaoyi Qin, Yaogen Yang, Lin Yang, Xuyang Wang, Junjie Wang, Ming Li

Comments: Submitted to ICASSP2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[204] arXiv:2011.11436 (cross-list from cs.SD) [pdf, other]: Title: Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer

Authors: Mohammad Soltanian, Junaid Malik, Jenni Raitoharju, Alexandros Iosifidis, Serkan Kiranyaz, Moncef Gabbouj

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[205] arXiv:2011.11588 (cross-list from cs.CL) [pdf, other]: Title: The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling

Authors: Tu Anh Nguyen, Maureen de Seyssel, Patricia Rozé, Morgane Rivière, Evgeny Kharitonov, Alexei Baevski, Ewan Dunbar, Emmanuel Dupoux

Comments: 14 pages, including references and supplementary material

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[206] arXiv:2011.11715 (cross-list from cs.CL) [pdf, other]: Title: Multi-task Language Modeling for Improving Speech Recognition of Rare Words

Authors: Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko

Comments: Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2021

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207] arXiv:2011.11970 (cross-list from cs.SD) [pdf, other]: Title: A Novel Multimodal Music Genre Classifier using Hierarchical Attention and Convolutional Neural Network

Authors: Manish Agrawal, Abhilash Nandy

Comments: 7 pages, 4 figures

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[208] arXiv:2011.12022 (cross-list from cs.SD) [pdf, other]: Title: Multi-Decoder DPRNN: High Accuracy Source Counting and Separation

Authors: Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson

Comments: Project Page: this https URL Submitted to ICASSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[209] arXiv:2011.12536 (cross-list from cs.SD) [pdf, ps, other]: Title: Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding

Authors: Achintya kr. Sarkar, Zheng-Hua Tan (Senior Member, IEEE)

Comments: Copyright (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: IEEE Signal Processing Letters, vol. 28, pp. 364-368, 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[210] arXiv:2011.12596 (cross-list from cs.SD) [pdf, other]: Title: MTCRNN: A multi-scale RNN for directed audio texture synthesis

Authors: M. Huzaifah, L. Wyse

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[211] arXiv:2011.12649 (cross-list from cs.CL) [pdf, other]: Title: Neural Representations for Modeling Variation in Speech

Authors: Martijn Bartelds, Wietse de Vries, Faraz Sanal, Caitlin Richter, Mark Liberman, Martijn Wieling

Comments: Submitted to Journal of Phonetics

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[212] arXiv:2011.12985 (cross-list from cs.SD) [pdf, other]: Title: FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge

Authors: Bichen Wu, Qing He, Peizhao Zhang, Thilo Koehler, Kurt Keutzer, Peter Vajda

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[213] arXiv:2011.12999 (cross-list from cs.GR) [pdf, other]: Title: Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio

Authors: João P. Ferreira, Thiago M. Coutinho, Thiago L. Gomes, José F. Neto, Rafael Azevedo, Renato Martins, Erickson R. Nascimento

Comments: Accepted at the Elsevier Computers & Graphics (C&G) 2020

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[214] arXiv:2011.13122 (cross-list from cs.SD) [pdf, other]: Title: Real-time error correction and performance aid for MIDI instruments

Authors: Georgi Marinov

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[215] arXiv:2011.13148 (cross-list from cs.SD) [pdf, ps, other]: Title: Streaming end-to-end multi-talker speech recognition

Authors: Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

Comments: 5 pages, 3 figures. Accepted to IEEE Signal Processing Letters 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[216] arXiv:2011.13320 (cross-list from cs.SD) [pdf, ps, other]: Title: Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough

Authors: Gunvant Chaudhari, Xinyi Jiang, Ahmed Fakhry, Asriel Han, Jaclyn Xiao, Sabrina Shen, Amil Khanzada

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[217] arXiv:2011.13393 (cross-list from cs.SD) [pdf, other]: Title: Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

Authors: Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

Comments: Accepted by ICASSP2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218] arXiv:2011.13439 (cross-list from cs.CL) [pdf, other]: Title: Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training

Authors: Sameer Khurana, Niko Moritz, Takaaki Hori, Jonathan Le Roux

Comments: ICASSP 2021

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[219] arXiv:2011.13453 (cross-list from cs.SD) [pdf, other]: Title: Towards Movement Generation with Audio Features

Authors: Benedikte Wallace, Charles P. Martin, Jim Torresen, Kristian Nymoen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[220] arXiv:2011.13645 (cross-list from cs.CE) [pdf, ps, other]: Title: Numerical and experimental study of tonal noise sources at the outlet of an isolated centrifugal fan

Authors: Martin Ottersten, Hua-Dong Yao, Lars Davidson

Subjects: Computational Engineering, Finance, and Science (cs.CE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[221] arXiv:2011.14334 (cross-list from cs.SD) [pdf, other]: Title: Audio-visual Speech Separation with Adversarially Disentangled Visual Representation

Authors: Peng Zhang, Jiaming Xu, Jing shi, Yunzhe Hao, Bo Xu

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[222] arXiv:2011.14336 (cross-list from cs.SD) [pdf, ps, other]: Title: An Features Extraction and Recognition Method for Underwater Acoustic Target Based on ATCNN

Authors: Gang Hu, Kejun Wang, Liangliang Liu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223] arXiv:2011.14445 (cross-list from cs.SD) [pdf, other]: Title: Audio, Speech, Language, & Signal Processing for COVID-19: A Comprehensive Overview

Authors: Gauri Deshpande, Björn W. Schuller

Comments: arXiv admin note: text overlap with arXiv:2005.08579

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[224] arXiv:2011.14885 (cross-list from cs.SD) [pdf, ps, other]: Title: Look who's not talking

Authors: Youngki Kwon, Hee Soo Heo, Jaesung Huh, Bong-Jin Lee, Joon Son Chung

Comments: SLT 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[225] arXiv:2011.15003 (cross-list from cs.SD) [pdf, other]: Title: Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Authors: Christoph Boeddeker, Wangyou Zhang, Tomohiro Nakatani, Keisuke Kinoshita, Tsubasa Ochiai, Marc Delcroix, Naoyuki Kamo, Yanmin Qian, Reinhold Haeb-Umbach

Comments: Accepted by ICASSP 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[226] arXiv:2011.15023 (cross-list from cs.CL) [pdf, other]: Title: Transformer-Transducers for Code-Switched Speech Recognition

Authors: Siddharth Dalmia, Yuzong Liu, Srikanth Ronanki, Katrin Kirchhoff

Comments: Accepted at ICASSP 2021

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[227] arXiv:2011.15096 (cross-list from cs.HC) [pdf, other]: Title: A proposal and evaluation of new timbre visualisation methods for audio sample browsers

Authors: Etienne Richan, Jean Rouat

Comments: 14 pages. Personal and Ubiquitous Computing (2020)

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 227 entries: 1-227 ]
[ showing 227 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Nov 2020