Sound

Authors and titles for cs.SD in Oct 2021

[ total of 324 entries: 1-322 | 323-324 ]
[ showing 322 entries per page: fewer | more | all ]

[1] arXiv:2110.00046 [pdf, other]: Title: SpliceOut: A Simple and Efficient Audio Augmentation Method

Authors: Arjit Jain, Pranay Reddy Samala, Deepak Mittal, Preethi Jyoti, Maneesh Singh

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2110.00155 [pdf, other]: Title: Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device

Authors: Zhouyuan Huo, Dongseong Hwang, Khe Chai Sim, Shefali Garg, Ananya Misra, Nikhil Siddhartha, Trevor Strohman, Françoise Beaufays

Comments: 5 pages

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2110.00570 [pdf, other]: Title: Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement

Authors: Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux

Comments: in submission

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2110.00794 [pdf, other]: Title: Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement

Authors: Protima Nomo Sudro, Rohit Sinha, S. R. Mahadeva Prasanna

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[5] arXiv:2110.00940 [pdf, other]: Title: PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction

Authors: Yi Ma, Kong Aik Lee, Ville Hautamaki, Haizhou Li

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6] arXiv:2110.01009 [pdf, other]: Title: Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging

Authors: Zhiling Zhang, Zelin Zhou, Haifeng Tang, Guangwei Li, Mengyue Wu, Kenny Q. Zhu

Comments: CIKM 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:2110.01147 [pdf, other]: Title: On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

Authors: Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[8] arXiv:2110.01210 [pdf, other]: Title: Audio Captioning Using Sound Event Detection

Authors: Ayşegül Özkaya Eren, Mustafa Sert

Comments: Submitted to DCASE 2021 Challenge

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2110.01367 [pdf, other]: Title: Audio-Visual Evaluation of Oratory Skills

Authors: Tzvi Michelson, Shmuel Peleg

Comments: TransAI 2021

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[10] arXiv:2110.01425 [pdf, other]: Title: Building a Noisy Audio Dataset to Evaluate Machine Learning Approaches for Automatic Speech Recognition Systems

Authors: Julio Cesar Duarte, Sérgio Colcher

Comments: Tech report series Monografias em Ci\^encia da Computa\c{c}\~ao, september, 2021, Dep. Inform\'atica PUC-Rio, RJ, BRAZIL, ISSN 0103-9741

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[11] arXiv:2110.02011 [pdf, other]: Title: Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection

Authors: Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan, Kazushige Ouchi

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:2110.02375 [pdf, other]: Title: Interpreting intermediate convolutional layers in unsupervised acoustic word classification

Authors: Gašper Beguš, Alan Zhou

Comments: ICASSP 2022

Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13] arXiv:2110.02411 [pdf, other]: Title: Voice Aging with Audio-Visual Style Transfer

Authors: Justin Wilson, Sunyeong Park, Seunghye J. Wilson, Ming C. Lin

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14] arXiv:2110.02584 [pdf, other]: Title: EdiTTS: Score-based Editing for Controllable Text-to-Speech

Authors: Jaesung Tae, Hyeongju Kim, Taesu Kim

Comments: 4 pages, 3 figures, 3 tables, INTERSPEECH 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:2110.02791 [pdf, other]: Title: Spell my name: keyword boosted speech recognition

Authors: Namkyu Jung, Geonmin Kim, Joon Son Chung

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[16] arXiv:2110.02878 [pdf, other]: Title: An Investigation of the Effectiveness of Phase for Audio Classification

Authors: Shunsuke Hidaka, Kohei Wakamiya, Tokihiko Kaburagi

Comments: 5 pages, 3 figures

Journal-ref: ICASSP (2022) 3708-3712

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[17] arXiv:2110.03156 [pdf, other]: Title: StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

Authors: Rui Liu, Berrak Sisman, Haizhou Li

Comments: Submitted to ICASSP 2022. 5 pages, 3 figures, 1 table. Our codes are available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18] arXiv:2110.03174 [pdf, other]: Title: Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study

Authors: Dawei Liang, Yangyang Shi, Yun Wang, Nayan Singhal, Alex Xiao, Jonathan Shaw, Edison Thomaz, Ozlem Kalinli, Mike Seltzer

Comments: Submitted to ICASSP 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19] arXiv:2110.03183 [pdf, other]: Title: Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....

Authors: Prateek Verma

Comments: IEEE Copyright: written as told

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20] arXiv:2110.03243 [pdf, ps, other]: Title: Sound Event Detection Guided by Semantic Contexts of Scenes

Authors: Noriyuki Tonami, Keisuke Imoto, Ryotaro Nagase, Yuki Okamoto, Takahiro Fukumori, Yoichi Yamashita

Comments: Accepted to ICASSP 2022

Subjects: Sound (cs.SD)
[21] arXiv:2110.03251 [pdf, other]: Title: A Cough-based deep learning framework for detecting COVID-19

Authors: Truong Hoang, Lam Pham, Dat Ngo, Hoang D. Nguyen

Comments: COVID-19, EMBC-2022, DiCOVA, top 2nd, benchmark on Spec > 0.95%

Journal-ref: EMBC 44 (2022) 3422-3425

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2110.03272 [pdf, ps, other]: Title: A Novel Blind Source Separation Framework Towards Maximum Signal-To-Interference Ratio

Authors: Jianju Gu, Longbiao Cheng, Dingding Yao, Junfeng Li, Yonghong Yan

Comments: 5 pages, 2 figures

Subjects: Sound (cs.SD)
[23] arXiv:2110.03370 [pdf, other]: Title: WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

Authors: Binbin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di Wu, Zhendong Peng

Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[24] arXiv:2110.03380 [pdf, other]: Title: Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity

Authors: You Jin Kim, Hee-Soo Heo, Jee-weon Jung, Youngki Kwon, Bong-Jin Lee, Joon Son Chung

Comments: This paper was submitted to ICASSP 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[25] arXiv:2110.03390 [pdf, other]: Title: GANtron: Emotional Speech Synthesis with Generative Adversarial Networks

Authors: Enrique Hortal, Rodrigo Brechard Alarcia

Comments: 9 pages, 4 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[26] arXiv:2110.03414 [pdf, other]: Title: SERAB: A multi-lingual benchmark for speech emotion recognition

Authors: Neil Scheidwasser-Clow, Mikolaj Kegler, Pierre Beckmann, Milos Cernak

Comments: Submitted to ICASSP 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:2110.03536 [pdf, other]: Title: Prototype Learning for Interpretable Respiratory Sound Analysis

Authors: Zhao Ren, Thanh Tam Nguyen, Wolfgang Nejdl

Comments: Technical report of the paper accepted by IEEE ICASSP 2022

Subjects: Sound (cs.SD)
[28] arXiv:2110.03744 [pdf, other]: Title: Voice Reenactment with F0 and timing constraints and adversarial learning of conversions

Authors: Frederik Bous, Laurent Benaroya, Nicolas Obin, Axel Roebel

Comments: arXiv admin note: text overlap with arXiv:2107.12346

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29] arXiv:2110.03771 [pdf, other]: Title: Wake-Cough: cough spotting and cougher identification for personalised long-term cough monitoring

Authors: Madhurananda Pahar, Marisa Klopper, Byron Reeve, Rob Warren, Grant Theron, Andreas Diacon, Thomas Niesler

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30] arXiv:2110.04057 [pdf, other]: Title: FAST-RIR: Fast neural diffuse room impulse response generator

Authors: Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

Comments: Accepted to ICASSP 2022. More results and source code is available at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[31] arXiv:2110.04091 [pdf, other]: Title: Affective Burst Detection from Speech using Kernel-fusion Dilated Convolutional Neural Networks

Authors: Berkay Kopru, Engin Erzin

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[32] arXiv:2110.04284 [pdf, other]: Title: Auto-DSP: Learning to Optimize Acoustic Echo Cancellers

Authors: Jonah Casebeer, Nicholas J. Bryan, Paris Smaragdis

Comments: Accepted to the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Source code and audio examples: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2110.04438 [pdf, other]: Title: Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification

Authors: Qingjian Lin, Lin Yang, Xuyang Wang, Xiaoyi Qin, Junjie Wang, Ming Li

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2110.04451 [pdf, other]: Title: Using multiple reference audios and style embedding constraints for speech synthesis

Authors: Cheng Gong, Longbiao Wang, Zhenhua Ling, Ju Zhang, Jianwu Dang

Comments: 5 pages,3 figures submitted to ICASSP2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[35] arXiv:2110.04474 [pdf, other]: Title: A Mutual learning framework for Few-shot Sound Event Detection

Authors: Dongchao Yang, Helin Wang, Yuexian Zou, Zhongjie Ye, Wenwu Wang

Comments: Accepted by ICASSP2022. arXiv admin note: text overlap with arXiv:2106.12252 by other authors

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:2110.04486 [pdf, other]: Title: PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control

Authors: Yunchao He, Jian Luan, Yujun Wang

Comments: Accepted by ICASSP 2022. 5 pages, 4 figures, 3 tables. Audio samples are available at: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[37] arXiv:2110.04621 [pdf, other]: Title: Universal Paralinguistic Speech Representations Using Self-Supervised Conformers

Authors: Joel Shor, Aren Jansen, Wei Han, Daniel Park, Yu Zhang

Journal-ref: ICASSP 2022-2022 IEEE

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38] arXiv:2110.04656 [pdf, other]: Title: Streaming on-device detection of device directed speech from voice and touch-based invocation

Authors: Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[39] arXiv:2110.04678 [pdf, other]: Title: An Overview of Techniques for Biomarker Discovery in Voice Signal

Authors: Rita Singh, Ankit Shah, Hira Dhamyal

Comments: Last two authors contributed equally to the paper

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[40] arXiv:2110.04684 [pdf, other]: Title: Can Audio Captions Be Evaluated with Image Caption Metrics?

Authors: Zelin Zhou, Zhiling Zhang, Xuenan Xu, Zeyu Xie, Mengyue Wu, Kenny Q. Zhu

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[41] arXiv:2110.04754 [pdf, other]: Title: Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

Authors: Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[42] arXiv:2110.04765 [pdf, other]: Title: Multi-task Learning with Metadata for Music Mood Classification

Authors: Rajnish Kumar, Manjeet Dahiya

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[43] arXiv:2110.04946 [pdf, other]: Title: LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example

Authors: Hieu-Thi Luong, Junichi Yamagishi

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44] arXiv:2110.04972 [pdf, ps, other]: Title: Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations

Authors: Ryosuke Horiuchi, Shoichi Koyama, Juliano G. C. Ribeiro, Natsuki Ueno, Hiroshi Saruwatari

Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45] arXiv:2110.05020 [pdf, other]: Title: MELONS: generating melody with long-term structure using transformers and structure graph

Authors: Yi Zou, Pei Zou, Yi Zhao, Kaixiang Zhang, Ran Zhang, Xiaorui Wang

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[46] arXiv:2110.05033 [pdf, other]: Title: Pitch Preservation In Singing Voice Synthesis

Authors: Shujun Liu, Hai Zhu, Kun Wang, Huajun Wang

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[47] arXiv:2110.05042 [pdf, other]: Title: Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification

Authors: Miao Zhao, Yufeng Ma, Yiwei Ding, Yu Zheng, Min Liu, Minqiang Xu

Comments: submitted to ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48] arXiv:2110.05054 [pdf, other]: Title: Source Mixing and Separation Robust Audio Steganography

Authors: Naoya Takahashi, Mayank Kumar Singh, Yuki Mitsufuji

Comments: Accepted to ICASSP 2022

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[49] arXiv:2110.05059 [pdf, other]: Title: Amicable examples for informed source separation

Authors: Naoya Takahashi, Yuki Mitsufuji

Comments: Accepted to ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2110.05069 [pdf, other]: Title: Efficient Training of Audio Transformers with Patchout

Authors: Khaled Koutini, Jan Schlüter, Hamid Eghbal-zadeh, Gerhard Widmer

Comments: Submitted to Interspeech 2022. Source code: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[51] arXiv:2110.05087 [pdf, ps, other]: Title: A Multi-Resolution Front-End for End-to-End Speech Anti-Spoofing

Authors: Wei Liu, Meng Sun, Xiongwei Zhang, Hugo Van hamme, Thomas Fang Zheng

Comments: submitted to ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:2110.05580 [pdf, other]: Title: vocadito: A dataset of solo vocals with $f_0$, note, and lyric annotations

Authors: Rachel M. Bittner, Katherine Pasalo, Juan José Bosch, Gabriel Meseguer-Brocal, David Rubinstein

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2110.05587 [pdf, other]: Title: Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes

Authors: Karn N. Watcharasupat, Alexander Lerch

Comments: Submitted to the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54] arXiv:2110.05713 [pdf, other]: Title: Foster Strengths and Circumvent Weaknesses: a Speech Enhancement Framework with Two-branch Collaborative Learning

Authors: Wenxin Tai, Jiajia Li, Yixiang Wang, Tian Lan, Qiao Liu

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2110.05765 [pdf, other]: Title: Music Sentiment Transfer

Authors: Miles Sigel, Michael Zhou, Jiebo Luo

Comments: NSF REU: Computational Methods for Understanding Music, Media, and Minds, University of Rochester

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[56] arXiv:2110.05777 [pdf, other]: Title: Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification

Authors: Zhengyang Chen, Sanyuan Chen, Yu Wu, Yao Qian, Chengyi Wang, Shujie Liu, Yanmin Qian, Michael Zeng

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57] arXiv:2110.05798 [pdf, other]: Title: Adapting TTS models For New Speakers using Transfer Learning

Authors: Paarth Neekhara, Jason Li, Boris Ginsburg

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[58] arXiv:2110.05866 [pdf, ps, other]: Title: MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

Authors: Szu-Wei Fu, Cheng Yu, Kuo-Hsuan Hung, Mirco Ravanelli, Yu Tsao

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[59] arXiv:2110.05966 [pdf, other]: Title: Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training

Authors: Changsheng Quan, Xiaofei Li

Comments: accepted by ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2110.05975 [pdf, other]: Title: Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays

Authors: Chengdong Liang, Yijiang Chen, Jiadi Yao, Xiao-Lei Zhang

Comments: 5 pages, 3 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61] arXiv:2110.06100 [pdf, other]: Title: Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information

Authors: Zhongjie Ye, Helin Wang, Dongchao Yang, Yuexian Zou

Comments: 5 pages, 1 figure, accepted by DCASE 2021 workshop

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[62] arXiv:2110.06123 [pdf, other]: Title: COVID-19 Diagnosis from Cough Acoustics using ConvNets and Data Augmentation

Authors: Saranga Kingkor Mahanta, Darsh Kaushik, Shubham Jain, Hoang Van Truong, Koushik Guha

Comments: DiCOVA, top 1st, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2110.06280 [pdf, other]: Title: S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations

Authors: Wen-Chin Huang, Shu-Wen Yang, Tomoki Hayashi, Hung-Yi Lee, Shinji Watanabe, Tomoki Toda

Comments: Submitted to ICASSP 2022. Code available at: this https URL

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64] arXiv:2110.06323 [pdf, other]: Title: An Annihilating Filter-Based DOA Estimation for Uniform Linear Array

Authors: Son Phan, Lam Pham

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:2110.06371 [pdf, other]: Title: Algorithmic Composition by Autonomous Systems with Multiple Time-Scales

Authors: Risto Holopainen

Comments: 28 pages, 3 figures. Submitted to Divergence Press

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Adaptation and Self-Organizing Systems (nlin.AO)
[66] arXiv:2110.06467 [pdf, other]: Title: Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement

Authors: Guochen Yu, Andong Li, Chengshi Zheng, Yinuo Guo, Yutian Wang, Hui Wang

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[67] arXiv:2110.06494 [pdf, other]: Title: Music Source Separation with Deep Equilibrium Models

Authors: Yuichiro Koyama, Naoki Murata, Stefan Uhlich, Giorgio Fabbro, Shusuke Takahashi, Yuki Mitsufuji

Comments: 5 pages, 4 figures, accepted for publication in IEEE ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:2110.06501 [pdf, other]: Title: Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

Authors: Yuichiro Koyama, Kazuhide Shigemi, Masafumi Takahashi, Kazuki Shimada, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji

Comments: 5 pages, 2 figures, accepted for publication in IEEE ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:2110.06525 [pdf, other]: Title: Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks

Authors: Bo-Yu Chen, Wei-Han Hsu, Wei-Hsiang Liao, Marco A. Martínez Ramírez, Yuki Mitsufuji, Yi-Hsuan Yang

Comments: To be published at ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[70] arXiv:2110.06534 [pdf, other]: Title: Simple Attention Module based Speaker Verification with Iterative noisy label detection

Authors: Xiaoyi Qin, Na Li, Chao Weng, Dan Su, Ming Li

Comments: submitted to ICASSP2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71] arXiv:2110.06543 [pdf, ps, other]: Title: EIHW-MTG DiCOVA 2021 Challenge System Report

Authors: Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72] arXiv:2110.06565 [pdf, other]: Title: Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning

Authors: Li Zhang, Qing Wang, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2110.06634 [pdf, other]: Title: End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network

Authors: Yina Guo, Xiaofei Zhang, Zhenying Gong, Anhong Wang, Wenwu Wang

Comments: 12 pages, 13 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[74] arXiv:2110.06707 [pdf, other]: Title: Singer separation for karaoke content generation

Authors: Hsuan-Yu Chen, Xuanjun Chen, Jyh-Shing Roger Jang

Comments: Submitted to ICASSP 2022

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[75] arXiv:2110.06999 [pdf, other]: Title: Study of positional encoding approaches for Audio Spectrogram Transformers

Authors: Leonardo Pepino, Pablo Riera, Luciana Ferrer

Comments: Submitted to ICASSP 2022. 5 pages, 3 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[76] arXiv:2110.07027 [pdf, other]: Title: Comparison of SVD and factorized TDNN approaches for speech to text

Authors: Jeffrey Josanne Michael, Nagendra Kumar Goel, Navneeth K, Jonas Robertson, Shravan Mishra

Comments: 4 pages, 1 figure, 3 tables

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[77] arXiv:2110.07210 [pdf, other]: Title: Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data

Authors: Haitong Zhang, Yue Lin

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[78] arXiv:2110.07311 [pdf, other]: Title: SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs

Authors: Adrián Barahona-Ríos, Tom Collins

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[79] arXiv:2110.07313 [pdf, other]: Title: Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Authors: Sangeeta Srivastava, Yun Wang, Andros Tjandra, Anurag Kumar, Chunxi Liu, Kritika Singh, Yatharth Saraf

Comments: 4 pages. Submitted to ICASSP in Oct 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[80] arXiv:2110.07393 [pdf, other]: Title: M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge

Authors: Fan Yu, Shiliang Zhang, Yihui Fu, Lei Xie, Siqi Zheng, Zhihao Du, Weilong Huang, Pengcheng Guo, Zhijie Yan, Bin Ma, Xin Xu, Hui Bu

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2110.07607 [pdf, other]: Title: HumBugDB: A Large-scale Acoustic Mosquito Dataset

Authors: Ivan Kiskin, Marianne Sinka, Adam D. Cobb, Waqas Rafique, Lawrence Wang, Davide Zilli, Benjamin Gutteridge, Rinita Dam, Theodoros Marinos, Yunpeng Li, Dickson Msaky, Emmanuel Kaindoa, Gerard Killeen, Eva Herreros-Moya, Kathy J. Willis, Stephen J. Roberts

Comments: Accepted at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. 10 pages main, 39 pages including appendix. This paper accompanies the dataset found at this https URL with corresponding code at this https URL

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[82] arXiv:2110.08090 [pdf, other]: Title: Using DeepProbLog to perform Complex Event Processing on an Audio Stream

Authors: Marc Roig Vilamala, Tianwei Xing, Harrison Taylor, Luis Garcia, Mani Srivastava, Lance Kaplan, Alun Preece, Angelika Kimmig, Federico Cerutti

Comments: 8 pages, 3 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[83] arXiv:2110.08213 [pdf, other]: Title: Towards Identity Preserving Normal to Dysarthric Voice Conversion

Authors: Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda

Comments: Submitted to ICASSP 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[84] arXiv:2110.08352 [pdf, other]: Title: Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

Authors: Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[85] arXiv:2110.08437 [pdf, other]: Title: NN3A: Neural Network supported Acoustic Echo Cancellation, Noise Suppression and Automatic Gain Control for Real-Time Communications

Authors: Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu

Comments: submitted to ICASSP2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2110.08439 [pdf, other]: Title: Controllable Multichannel Speech Dereverberation based on Deep Neural Networks

Authors: Ziteng Wang, Yueyue Na, Biao Tian, Qiang Fu

Comments: submitted to ICASSP2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2110.08634 [pdf, other]: Title: Towards Robust Waveform-Based Acoustic Models

Authors: Dino Oglic, Zoran Cvetkovic, Peter Sollich, Steve Renals, Bin Yu

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[88] arXiv:2110.08731 [pdf, ps, other]: Title: Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms

Authors: Tien-Hong Lo, Yao-Ting Sung, Berlin Chen

Comments: 7 pages, 2 figures, 4 tables, accepted to Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[89] arXiv:2110.08821 [pdf, other]: Title: Storage and Authentication of Audio Footage for IoAuT Devices Using Distributed Ledger Technology

Authors: Srivatsav Chenna, Nils Peters

Comments: 11 pages, 3 Figures, 1 code listing

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[90] arXiv:2110.08895 [pdf, other]: Title: DECAR: Deep Clustering for learning general-purpose Audio Representations

Authors: Sreyan Ghosh, Sandesh V Katta, Ashish Seth, S. Umesh

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[91] arXiv:2110.09103 [pdf, other]: Title: LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech

Authors: Wen-Chin Huang, Erica Cooper, Junichi Yamagishi, Tomoki Toda

Comments: Submitted to ICASSP 2022. Code available at: this https URL

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[92] arXiv:2110.09116 [pdf, ps, other]: Title: Real Additive Margin Softmax for Speaker Verification

Authors: Lantian Li, Ruiqian Nai, Dong Wang

Comments: Submitted to ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[93] arXiv:2110.09121 [pdf, ps, other]: Title: KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke

Authors: Xiaobin Zhuang, Huiran Yu, Weifeng Zhao, Tao Jiang, Peng Hu

Comments: To be published in Proc. Interspeech 2022, Incheon, South Korea

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2110.09127 [pdf, other]: Title: SpecTNT: a Time-Frequency Transformer for Music Audio

Authors: Wei-Tsung Lu, Ju-Chiang Wang, Minz Won, Keunwoo Choi, Xuchen Song

Comments: 6 pages

Journal-ref: International Society for Music Information Retrieval (ISMIR) 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95] arXiv:2110.09223 [pdf, other]: Title: Learning Models for Query by Vocal Percussion: A Comparative Study

Authors: Alejandro Delgado, SkoT McDonald, Ning Xu, Charalampos Saitis, Mark Sandler

Comments: Published in proceedings of the International Computer Music Conference (ICMC) 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2110.09239 [pdf, ps, other]: Title: EIHW-MTG: Second DiCOVA Challenge System Report

Authors: Adria Mallol-Ragolta, Helena Cuesta, Emilia Gómez, Björn W. Schuller

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[97] arXiv:2110.09441 [pdf, other]: Title: FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection

Authors: Zhenyu Zhang, Yewei Gu, Xiaowei Yi, Xianfeng Zhao

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[98] arXiv:2110.09598 [pdf, ps, other]: Title: Adversarial Domain Adaptation with Paired Examples for Acoustic Scene Classification on Different Recording Devices

Authors: Stanisław Kacprzak, Konrad Kowalczyk

Comments: Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021

Journal-ref: 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021, pp. 1030-103

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99] arXiv:2110.09600 [pdf, other]: Title: Who calls the shots? Rethinking Few-Shot Learning for Audio

Authors: Yu Wang, Nicholas J. Bryan, Justin Salamon, Mark Cartwright, Juan Pablo Bello

Comments: WASPAA 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2110.09605 [pdf, other]: Title: Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks

Authors: Marco Comunità, Huy Phan, Joshua D. Reiss

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[101] arXiv:2110.09698 [pdf, other]: Title: Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

Authors: Mutian He, Jingzhou Yang, Lei He, Frank K. Soong

Comments: 5 pages, 3 figures; accepted by Interspeech 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[102] arXiv:2110.09720 [pdf, other]: Title: Rep Works in Speaker Verification

Authors: Yufeng Ma, Miao Zhao, Yiwei Ding, Yu Zheng, Min Liu, Minqiang Xu

Comments: submitted to ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2110.09780 [pdf, other]: Title: Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation

Authors: Fengyu Yang, Jian Luan, Yujun Wang

Comments: accepted by ICASSP2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104] arXiv:2110.09784 [pdf, other]: Title: SSAST: Self-Supervised Audio Spectrogram Transformer

Authors: Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

Comments: Accepted at AAAI2022. Code at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[105] arXiv:2110.09814 [pdf, other]: Title: Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition

Authors: Haozhe Chen, Weiming Zhang, Kunlin Liu, Kejiang Chen, Han Fang, Nenghai Yu

Comments: 5 pages, 2 figures. Acceptted by 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[106] arXiv:2110.10010 [pdf, other]: Title: Temporal separation of whale vocalizations from background oceanic noise using a power calculation

Authors: Jacques van Wyk, Jaco Versfeld, Johan du Preez

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2110.10103 [pdf, other]: Title: Continual self-training with bootstrapped remixing for speech enhancement

Authors: Efthymios Tzinis, Yossi Adi, Vamsi K. Ithapu, Buye Xu, Anurag Kumar

Comments: To appear in Proc. ICASSP 2022, May 22-27, 2022, Singapore

Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[108] arXiv:2110.10402 [pdf, other]: Title: An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

Authors: Huaibo Zhao, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

Comments: Accepted to APSIPA 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[109] arXiv:2110.10491 [pdf, ps, other]: Title: A Study On Data Augmentation In Voice Anti-Spoofing

Authors: Ariel Cohen, Inbal Rimon, Eran Aflalo, Haim Permuter

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[110] arXiv:2110.10593 [pdf, ps, other]: Title: Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method

Authors: Chenyang Gao, Yue Gu, Ivan Marsic

Comments: Submitted to Interspeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[111] arXiv:2110.10739 [pdf, other]: Title: Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training

Authors: Aswin Sivaraman, Scott Wisdom, Hakan Erdogan, John R. Hershey

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112] arXiv:2110.10757 [pdf, other]: Title: TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement

Authors: Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

Comments: Accepted for publication in ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113] arXiv:2110.10983 [pdf, other]: Title: Optimizing Multi-Taper Features for Deep Speaker Verification

Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Comments: To appear in IEEE Signal Processing Letters

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[114] arXiv:2110.11499 [pdf, other]: Title: Wav2CLIP: Learning Robust Audio Representations From CLIP

Authors: Ho-Hsiang Wu, Prem Seetharaman, Kundan Kumar, Juan Pablo Bello

Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[115] arXiv:2110.11807 [pdf, ps, other]: Title: Signal-Envelope: A C++ library with Python bindings for temporal envelope estimation

Authors: Carlos Tarjano, Valdecy Pereira

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2110.11844 [pdf, other]: Title: Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network

Authors: Ashutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

Comments: Accepted for publication in INTERSPEECH 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2110.12138 [pdf, other]: Title: Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding

Authors: Wei Wang, Shuo Ren, Yao Qian, Shujie Liu, Yu Shi, Yanmin Qian, Michael Zeng

Comments: submitted to ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2110.12539 [pdf, other]: Title: Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech

Authors: Marek Strong, Jonas Rohnke, Antonio Bonafonte, Mateusz Łajszczak, Trevor Wood

Comments: 5 pages, 5 figures, accepted at IberSPEECH 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[119] arXiv:2110.12561 [pdf, other]: Title: Lhotse: a speech data representation library for the modern deep learning ecosystem

Authors: Piotr Żelasko, Daniel Povey, Jan "Yenda" Trmal, Sanjeev Khudanpur

Comments: Accepted for presentation at NeurIPS 2021 Data-Centric AI (DCAI) Workshop

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120] arXiv:2110.12612 [pdf, other]: Title: DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021

Authors: Yanqing Liu, Zhihang Xu, Gang Wang, Kuan Chen, Bohan Li, Xu Tan, Jinzhu Li, Lei He, Sheng Zhao

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[121] arXiv:2110.12778 [pdf, other]: Title: A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments

Authors: Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis

Comments: arXiv admin note: text overlap with arXiv:2105.04488

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[122] arXiv:2110.12855 [pdf, other]: Title: Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience

Authors: Wei-Tsung Lu, Meng-Hsuan Wu, Yuh-Ming Chiu, Li Su

Comments: 9 pages, Proceedings of the 29th ACM International Conference on Multimedia

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[123] arXiv:2110.13071 [pdf, other]: Title: Unsupervised Source Separation By Steering Pretrained Music Models

Authors: Ethan Manilow, Patrick O'Reilly, Prem Seetharaman, Bryan Pardo

Comments: Submitted to ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124] arXiv:2110.13130 [pdf, other]: Title: Multichannel Speech Enhancement without Beamforming

Authors: Asutosh Pandey, Buye Xu, Anurag Kumar, Jacob Donley, Paul Calamia, DeLiang Wang

Comments: Accepted for publication in ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2110.13323 [pdf, other]: Title: Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit

Authors: Hugo Flores Garcia, Aldo Aguilar, Ethan Manilow, Dmitry Vedenko, Bryan Pardo

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[126] arXiv:2110.13465 [pdf, other]: Title: CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization

Authors: Ruiteng Zhang, Jianguo Wei, Wenhuan Lu, Lin Zhang, Yantao Ji, Junhai Xu, Xugang Lu

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127] arXiv:2110.13589 [pdf, other]: Title: AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics

Authors: Jack Geraghty, Jiazheng Li, Alessandro Ragano, Andrew Hines

Comments: 6 pages, 3 figures, accepted and presented at ACM MMSys22, June, 2022, Athlone, Ireland

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128] arXiv:2110.14131 [pdf, other]: Title: Temporal Knowledge Distillation for On-device Audio Classification

Authors: Kwanghee Choi, Martin Kersner, Jacob Morton, Buru Chang

Comments: ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[129] arXiv:2110.14422 [pdf, ps, other]: Title: Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning

Authors: Shijun Wang, Dimche Kostadinov, Damian Borth

Comments: Published in: 2022 International Joint Conference on Neural Networks (IJCNN)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[130] arXiv:2110.14425 [pdf, other]: Title: Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data

Authors: Pablo Gimeno, Victoria Mingote, Alfonso Ortega, Antonio Miguel, Eduardo Lleida

Journal-ref: IEEE Signal Processing Letters, vol. 28, pp. 1135-1139, 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131] arXiv:2110.14434 [pdf, ps, other]: Title: Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of Audio Signals

Authors: Axel Marmoret, Florian Voorwinden, Valentin Leplat, Jérémy E. Cohen, Frédéric Bimbot

Comments: 4 pages, 2 figures, 1 table, 1 algorithm. To be published in GRETSI2022. The algorithm is available at this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Numerical Analysis (math.NA)
[132] arXiv:2110.14437 [pdf, other]: Title: Exploring single-song autoencoding schemes for audio-based music structure analysis

Authors: Axel Marmoret, Jérémy E. Cohen, Frédéric Bimbot

Comments: 4 pages, 4 figures, 2 tables. Rejected from ICASSP 2022, an extended version is available at arXiv:2202.04981

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[133] arXiv:2110.14513 [pdf, other]: Title: Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Authors: Hyeong-Seok Choi, Juheon Lee, Wansoo Kim, Jie Hwan Lee, Hoon Heo, Kyogu Lee

Comments: Neural Information Processing Systems (NeurIPS) 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[134] arXiv:2110.15316 [pdf, ps, other]: Title: VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge

Authors: Yougen Yuan, Zhiqiang Lv, Shen Huang, Pengfei Hu

Comments: 6 pages, in Chinese language, 3 tables, NCMMC 2021 conference paper

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2110.15430 [pdf, other]: Title: Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction

Authors: Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang

Comments: 5 pages, 1 figure, submitted to ICASSP 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[136] arXiv:2110.15729 [pdf, ps, other]: Title: Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems

Authors: Mohd Abbas Zaidi, Beomseok Lee, Sangha Kim, Chanwoo Kim

Comments: 5 pages, 3 figures, 1 table

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[137] arXiv:2110.15792 [pdf, other]: Title: VRAIN-UPV MLLP's system for the Blizzard Challenge 2021

Authors: Alejandro Pérez-González-de-Martos, Albert Sanchis, Alfons Juan

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2110.00508 (cross-list from cs.LG) [pdf, other]: Title: An Ensemble-based Multi-Criteria Decision Making Method for COVID-19 Cough Classification

Authors: Nihad Karim Chowdhury, Muhammad Ashad Kabir, Md. Muhtadir Rahman

Comments: 21 pages, 6 figures

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[139] arXiv:2110.01001 (cross-list from cs.MM) [pdf, other]: Title: Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation

Authors: Kunal Vaswani, Yudhik Agrawal, Vinoo Alluri

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140] arXiv:2110.02404 (cross-list from cs.CV) [pdf, other]: Title: 3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video

Authors: Justin Wilson, Ming C. Lin

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141] arXiv:2110.02405 (cross-list from cs.CV) [pdf, other]: Title: Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction

Authors: Justin Wilson, Nicholas Rewkowski, Ming C. Lin, Henry Fuchs

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142] arXiv:2110.02498 (cross-list from cs.CR) [pdf, other]: Title: Adversarial Attacks on Machinery Fault Diagnosis

Authors: Jiahao Chen, Diqun Yan

Comments: 5 pages, 5 figures. Submitted to Interspeech 2022

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2110.02891 (cross-list from cs.LG) [pdf, other]: Title: Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models

Authors: Jen-Hao Rick Chang, Ashish Shrivastava, Hema Swetha Koppula, Xiaoshuai Zhang, Oncel Tuzel

Comments: ICML 2022

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2110.03047 (cross-list from cs.CL) [pdf, ps, other]: Title: Integrating Categorical Features in End-to-End ASR

Authors: Rongqing Huang

Comments: Submitted to ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2110.03281 (cross-list from cs.LG) [pdf, other]: Title: Detecting Autism Spectrum Disorders with Machine Learning Models Using Speech Transcripts

Authors: Vikram Ramesh, Rida Assaf

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146] arXiv:2110.03326 (cross-list from cs.CL) [pdf, other]: Title: Back from the future: bidirectional CTC decoding using future information in speech recognition

Authors: Namkyu Jung, Geonmin Kim, Han-Gyu Kim

Comments: submitted to ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2110.03427 (cross-list from cs.LG) [pdf, other]: Title: Is Attention always needed? A Case Study on Language Identification from Speech

Authors: Atanu Mandal, Santanu Pal, Indranil Dutta, Mahidas Bhattacharya, Sudip Kumar Naskar

Comments: Accepted for publication in Natural Language Engineering

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[148] arXiv:2110.03560 (cross-list from cs.CL) [pdf, ps, other]: Title: Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

Authors: Sameer Khurana, Antoine Laurent, James Glass

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2110.03609 (cross-list from cs.CL) [pdf, ps, other]: Title: Applying Phonological Features in Multilingual Text-To-Speech

Authors: Cong Zhang, Huinan Zeng, Huang Liu, Jiewen Zheng

Comments: demo webpage: this https URL

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150] arXiv:2110.03756 (cross-list from cs.CL) [pdf, ps, other]: Title: Sonorant spectra and coarticulation distinguish speakers with different dialects

Authors: Charalambos Themistocleous, Valantis Fyndanis, Kyrana Tsapkini

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[151] arXiv:2110.03847 (cross-list from cs.CL) [pdf, other]: Title: Machine Translation Verbosity Control for Automatic Dubbing

Authors: Surafel M. Lakew, Marcello Federico, Yue Wang, Cuong Hoang, Yogesh Virkar, Roberto Barra-Chicote, Robert Enyedi

Comments: Accepted at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152] arXiv:2110.03876 (cross-list from cs.CL) [pdf, other]: Title: Phone-to-audio alignment without text: A Semi-supervised Approach

Authors: Jian Zhu, Cong Zhang, David Jurgens

Comments: ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153] arXiv:2110.03879 (cross-list from cs.CL) [pdf, other]: Title: Explaining the Attention Mechanism of End-to-End Speech Recognition Using Decision Trees

Authors: Yuanchao Wang, Wenji Du, Chenghao Cai, Yanyan Xu

Comments: 10 pages, 5 figures

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154] arXiv:2110.04267 (cross-list from cs.LG) [pdf, other]: Title: Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training

Authors: Lillian Zhou, Dhruv Guliani, Andreas Kabel, Giovanni Motta, Françoise Beaufays

Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155] arXiv:2110.04590 (cross-list from cs.CL) [pdf, other]: Title: An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

Authors: Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-yi Lee, Shinji Watanabe

Comments: To appear in ASRU2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[156] arXiv:2110.04891 (cross-list from cs.CL) [pdf, other]: Title: Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

Authors: Guoli Ye, Vadim Mazalov, Jinyu Li, Yifan Gong

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2110.04923 (cross-list from cs.LG) [pdf, ps, other]: Title: Crack detection using tap-testing and machine learning techniques to prevent potential rockfall incidents

Authors: Roya Nasimi, Fernando Moreu, John Stormont

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[158] arXiv:2110.04934 (cross-list from cs.CL) [pdf, other]: Title: Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition

Authors: Yiming Wang, Jinyu Li, Heming Wang, Yao Qian, Chengyi Wang, Yu Wu

Comments: Accepted at IEEE ICASSP 2022. 5 pages, 1 figure

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[159] arXiv:2110.05313 (cross-list from cs.LG) [pdf, other]: Title: Unsupervised Source Separation via Bayesian Inference in the Latent Domain

Authors: Michele Mancusi, Emilian Postolache, Giorgio Mariani, Marco Fumero, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

Comments: 5 pages, 2 figures, submitted to Interspeech 2022

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2110.05354 (cross-list from cs.CL) [pdf, ps, other]: Title: Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Authors: Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong

Comments: 5 pages, in Interspeech 2022

Journal-ref: Interspeech 2022, Incheon, Korea

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161] arXiv:2110.05607 (cross-list from cs.LG) [pdf, other]: Title: Partial Variable Training for Efficient On-Device Federated Learning

Authors: Tien-Ju Yang, Dhruv Guliani, Françoise Beaufays, Giovanni Motta

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162] arXiv:2110.05752 (cross-list from cs.CL) [pdf, other]: Title: UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training

Authors: Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu

Comments: ICASSP 2022 Submission

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[163] arXiv:2110.05941 (cross-list from cs.LG) [pdf, ps, other]: Title: Rank-based loss for learning hierarchical representations

Authors: Ines Nolasco, Dan Stowell

Comments: This version corrects a bug in the baseline results

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164] arXiv:2110.06263 (cross-list from cs.CL) [pdf, other]: Title: Speech Summarization using Restricted Self-Attention

Authors: Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze

Comments: Accepted at ICASSP 2022

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2110.07187 (cross-list from cs.CL) [pdf, other]: Title: Revisiting IPA-based Cross-lingual Text-to-speech

Authors: Haitong Zhang, Haoyue Zhan, Yang Zhang, Xinyuan Yu, Yue Lin

Comments: Submitted to ICASSP2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166] arXiv:2110.07274 (cross-list from cs.CL) [pdf, other]: Title: An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings

Authors: Wenxuan Ye, Shaoguang Mao, Frank Soong, Wenshan Wu, Yan Xia, Jonathan Tien, Zhiyong Wu

Comments: Accepted by ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[167] arXiv:2110.07354 (cross-list from cs.LG) [pdf, other]: Title: Music Playlist Title Generation: A Machine-Translation Approach

Authors: SeungHeon Doh, Junwon Lee, Juhan Nam

Comments: Proceedings of the 2nd Workshop on NLP for Music and Spoken Audio, 22th International Society for Music Information Retrieval Conference (ISMIR)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168] arXiv:2110.07410 (cross-list from cs.LG) [pdf, other]: Title: Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning

Authors: Benno Weck, Xavier Favory, Konstantinos Drossos, Xavier Serra

Comments: 5 pages, 4 figures. Accepted at Detection and Classification of Acoustic Scenes and Events 2021 (DCASE2021)

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2110.07592 (cross-list from cs.CL) [pdf, other]: Title: DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances

Authors: Sreyan Ghosh, Samden Lepcha, S Sakshi, Rajiv Ratn Shah, S. Umesh

Comments: Submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170] arXiv:2110.07749 (cross-list from cs.LG) [pdf, other]: Title: Attention-Free Keyword Spotting

Authors: Mashrur M. Morshed, Ahmad Omar Ahsan

Comments: 5 pages: Accepted at PML4DC workshop in ICLR 2022

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[171] arXiv:2110.07840 (cross-list from cs.CL) [pdf, other]: Title: ESPnet2-TTS: Extending the Edge of TTS Research

Authors: Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe

Comments: Submitted to ICASSP2022. Demo HP: this https URL

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172] arXiv:2110.07982 (cross-list from cs.CL) [pdf, other]: Title: Scribosermo: Fast Speech-to-Text models for German and other Languages

Authors: Daniel Bermuth, Alexander Poeppel, Wolfgang Reif

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173] arXiv:2110.08214 (cross-list from cs.CL) [pdf, other]: Title: From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation

Authors: Danni Liu, Changhan Wang, Hongyu Gong, Xutai Ma, Yun Tang, Juan Pino

Comments: Accepted by Interspeech 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174] arXiv:2110.08250 (cross-list from cs.CL) [pdf, other]: Title: Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention

Authors: Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Phillip Koehn, Juan Pino

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2110.08626 (cross-list from cs.LG) [pdf, other]: Title: Learning velocity model for complex media with deep convolutional neural networks

Authors: A. Stankevich, I. Nechepurenko, A. Shevchenko, L. Gremyachikh, A. Ustyuzhanin, A. Vasyukov

Comments: 14 pages, 6 figures, 6 tables

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176] arXiv:2110.08791 (cross-list from cs.CV) [pdf, other]: Title: Taming Visually Guided Sound Generation

Authors: Vladimir Iashin, Esa Rahtu

Comments: Accepted as an oral presentation for the BMVC 2021. Code: this https URL Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177] arXiv:2110.09245 (cross-list from cs.CL) [pdf, other]: Title: Efficient Sequence Training of Attention Models using Approximative Recombination

Authors: Nils-Philipp Wynands, Wilfried Michel, Jan Rosendahl, Ralf Schlüter, Hermann Ney

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178] arXiv:2110.09264 (cross-list from cs.CL) [pdf, other]: Title: Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages

Authors: Hemant Yadav, Akshat Gupta, Sai Krishna Rallabandi, Alan W Black, Rajiv Ratn Shah

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2110.09324 (cross-list from cs.CL) [pdf, other]: Title: Automatic Learning of Subword Dependent Model Scales

Authors: Felix Meyer, Wilfried Michel, Mohammad Zeineldeen, Ralf Schlüter, Hermann Ney

Comments: submitted to ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[180] arXiv:2110.10429 (cross-list from cs.LG) [pdf, other]: Title: Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach

Authors: Mun-Hak Lee, Joon-Hyuk Chang

Comments: 4page + 1page for citation + 2 pages for appendix

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181] arXiv:2110.12136 (cross-list from cs.CV) [pdf, other]: Title: A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data

Authors: Madina Abdrakhmanova, Saniya Abushakimova, Yerbolat Khassanov, Huseyin Atakan Varol

Comments: 7 pages, 4 figures, 4 tables

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[182] arXiv:2110.12408 (cross-list from cs.ET) [pdf, other]: Title: Quantum Computer Music: Foundations and Initial Experiments

Authors: Eduardo R. Miranda, Suchitra T. Basak

Comments: Pre-publication draft, to appear in book 'Quantum Computer Music', E. R. Miranda (Ed.). arXiv admin note: text overlap with arXiv:2006.13849

Subjects: Emerging Technologies (cs.ET); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantum Physics (quant-ph)
[183] arXiv:2110.13023 (cross-list from cs.LG) [pdf, other]: Title: ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease

Authors: Yash Kumar, Piyush Maheshwari, Shreyansh Joshi, Veeky Baths

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184] arXiv:2110.13250 (cross-list from cs.CR) [pdf, other]: Title: Beyond $L_p$ clipping: Equalization-based Psychoacoustic Attacks against ASRs

Authors: Hadi Abdullah, Muhammad Sajidur Rahman, Christian Peeters, Cassidy Gibson, Washington Garcia, Vincent Bindschaedler, Thomas Shrimpton, Patrick Traynor

Comments: accepted at ACML 2021

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2110.13492 (cross-list from cs.LG) [pdf, ps, other]: Title: TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining

Authors: Viet-Anh Nguyen, Anh H. T. Nguyen, Andy W. H. Khong

Comments: Published as a conference paper at ICASSP 2022, 5 pages, 4 figures, 3 tables

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186] arXiv:2110.13877 (cross-list from cs.CL) [pdf, other]: Title: Assessing Evaluation Metrics for Speech-to-Speech Translation

Authors: Elizabeth Salesky, Julian Mäder, Severin Klinger

Comments: ASRU 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2110.13900 (cross-list from cs.CL) [pdf, other]: Title: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing

Authors: Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei

Comments: Submitted to the Journal of Selected Topics in Signal Processing (JSTSP)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2110.14273 (cross-list from cs.CL) [pdf, other]: Title: Deep Learning For Prominence Detection In Children's Read Speech

Authors: Mithilesh Vaidya, Kamini Sabu, Preeti Rao

Comments: Under review at ICASSP 2022. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[189] arXiv:2110.14957 (cross-list from cs.AI) [pdf, other]: Title: End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings

Authors: Théo Deschamps-Berger (LISN, CNRS), Lori Lamel (LISN, CNRS), Laurence Devillers (LISN, CNRS, SU)

Journal-ref: 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), Sep 2021, Nara, Japan

Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[190] arXiv:2110.15222 (cross-list from cs.CL) [pdf, other]: Title: Word-level confidence estimation for RNN transducers

Authors: Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran

Journal-ref: Proc. ASRU 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191] arXiv:2110.15704 (cross-list from cs.CL) [pdf, ps, other]: Title: Influence of ASR and Language Model on Alzheimer's Disease Detection

Authors: Joan Codina-Filbà, Guillermo Cámbara, Jordi Luque, Mireia Farrús

Comments: 5 pages. Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.09272

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2110.15731 (cross-list from cs.CL) [pdf, other]: Title: CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese

Authors: Arnaldo Candido Junior, Edresson Casanova, Anderson Soares, Frederico Santos de Oliveira, Lucas Oliveira, Ricardo Corso Fernandes Junior, Daniel Peixoto Pinto da Silva, Fernando Gorgulho Fayet, Bruno Baldissera Carlotto, Lucas Rafael Stefanel Gris, Sandra Maria Aluísio

Comments: This paper is under consideration at Language Resources and Evaluation (LREV)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2110.15790 (cross-list from cs.IR) [pdf, other]: Title: LSTM-RPA: A Simple but Effective Long Sequence Prediction Algorithm for Music Popularity Prediction

Authors: Kun Li, Meng Li, Yanling Li, Min Lin

Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Social and Information Networks (cs.SI); Audio and Speech Processing (eess.AS)
[194] arXiv:2110.15836 (cross-list from cs.CL) [pdf, ps, other]: Title: Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition

Authors: Chak-Fai Li, Francis Keith, William Hartmann, Matthew Snover

Comments: 5 pages, minor changes for camera ready version, to be published in IEEE ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195] arXiv:2110.15909 (cross-list from cs.LG) [pdf, other]: Title: Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words

Authors: Santiago Cuervo, Maciej Grabias, Jan Chorowski, Grzegorz Ciesielski, Adrian Łańcucki, Paweł Rychlikowski, Ricard Marxer

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196] arXiv:2110.15941 (cross-list from cs.LG) [pdf, other]: Title: Personalized breath based biometric authentication with wearable multimodality

Authors: Manh-Ha Bui, Viet-Anh Tran, Cuong Pham

Comments: 7 pages (2 columns), 5 tables, 7 figures, submitted to ACM Multimedia 2020

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197] arXiv:2110.00165 (cross-list from eess.AS) [pdf, other]: Title: Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Authors: Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[198] arXiv:2110.00275 (cross-list from eess.AS) [pdf, other]: Title: SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

Authors: Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon-Seng Gan

Comments: (c) 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1749-1762, 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[199] arXiv:2110.00745 (cross-list from eess.AS) [pdf, other]: Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression

Authors: Karn N. Watcharasupat, Thi Ngoc Tho Nguyen, Woon-Seng Gan, Shengkui Zhao, Bin Ma

Comments: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)

Journal-ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 656-660

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[200] arXiv:2110.00797 (cross-list from eess.AS) [pdf, other]: Title: Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition

Authors: Protima Nomo Sudro, Rohan Kumar Das, Rohit Sinha, S. R. Mahadeva Prasanna

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[201] arXiv:2110.01077 (cross-list from eess.AS) [pdf, other]: Title: Multi-task Voice Activated Framework using Self-supervised Learning

Authors: Shehzeen Hussain, Van Nguyen, Shuhua Zhang, Erik Visser

Comments: Accepted at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[202] arXiv:2110.01164 (cross-list from eess.AS) [pdf, other]: Title: Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks

Authors: Zhaojie Luo, Shoufeng Lin, Rui Liu, Jun Baba, Yuichiro Yoshikawa, Ishiguro Hiroshi

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[203] arXiv:2110.01177 (cross-list from eess.AS) [pdf, other]: Title: The Second DiCOVA Challenge: Dataset and performance analysis for COVID-19 diagnosis using acoustics

Authors: Neeraj Kumar Sharma, Srikanth Raj Chetupalli, Debarpan Bhattacharya, Debottam Dutta, Pravin Mote, Sriram Ganapathy

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[204] arXiv:2110.01422 (cross-list from eess.AS) [pdf, ps, other]: Title: Individualized sound pressure equalization in hearing devices exploiting an electro-acoustic model

Authors: Henning Schepker, Reinhild Rohden, Florian Denk, Birger Kollmeier, Matthias Blau, Simon Doclo

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[205] arXiv:2110.01436 (cross-list from eess.AS) [pdf, other]: Title: WaveBeat: End-to-end beat and downbeat tracking in the time domain

Authors: Christian J. Steinmetz, Joshua D. Reiss

Comments: To appear at the 151st AES Convention

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[206] arXiv:2110.01763 (cross-list from eess.AS) [pdf, other]: Title: DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors

Authors: Chandan K A Reddy, Vishak Gopal, Ross Cutler

Comments: arXiv admin note: substantial text overlap with arXiv:2010.15258

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[207] arXiv:2110.02077 (cross-list from eess.AS) [pdf, other]: Title: Deep Optimization of Parametric IIR Filters for Audio Equalization

Authors: Giovanni Pepe (1 and 2), Leonardo Gabrielli (1), Stefano Squartini (1), Carlo Tripodi (2), Nicolò Strozzi (2) ((1) Università Politecnica delle Marche, (2) ASK Industries S.p.A.)

Comments: submitted to IEEE/ACM TASLP on 12 May 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[208] arXiv:2110.02144 (cross-list from eess.AS) [pdf, other]: Title: Late reverberation suppression using U-nets

Authors: Diego León, Felipe Tobar

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[209] arXiv:2110.02151 (cross-list from eess.AS) [pdf, other]: Title: Detection of blue whale vocalisations using a temporal-domain convolutional neural network

Authors: Bryan Sagredo, Sonia Español-Jiménez, Felipe Tobar

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[210] arXiv:2110.02189 (cross-list from eess.AS) [pdf, ps, other]: Title: Manifold learning-supported estimation of relative transfer functions for spatial filtering

Authors: Andreas Brendel, Johannes Zeitler, Walter Kellermann

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[211] arXiv:2110.02285 (cross-list from eess.AS) [pdf, ps, other]: Title: Modelling of the Fender Bassman 5F6-A Tone Stack

Authors: Steven Fenton

Comments: 5 pages, 6 figues. General Reference Paper

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[212] arXiv:2110.02345 (cross-list from eess.AS) [pdf, other]: Title: Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

Authors: Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

Comments: arXiv admin note: substantial text overlap with arXiv:2106.02170

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[213] arXiv:2110.02360 (cross-list from eess.AS) [pdf, other]: Title: Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

Authors: Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[214] arXiv:2110.02592 (cross-list from eess.AS) [pdf, other]: Title: Improving Real-time Score Following in Opera by Combining Music with Lyrics Tracking

Authors: Charles Brazier, Gerhard Widmer

Comments: 5 pages, In Proceedings of the 2nd Workshop on NLP for Music and Audio (NLP4MusA), Online, 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[215] arXiv:2110.02695 (cross-list from eess.AS) [pdf, other]: Title: Lower Interaural Coherence in Off-Signal Bands Impairs Binaural Detection

Authors: Bernhard Eurich, Jörg Encke, Stephan D. Ewert, Mathias Dietz

Comments: 14 pages, 5 figures

Journal-ref: J. Acoust. Soc. Am. 151(6), 2022, 3927-3936

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[216] arXiv:2110.03010 (cross-list from eess.AS) [pdf, other]: Title: AECMOS: A speech quality assessment metric for echo impairment

Authors: Marju Purin, Sten Sootla, Mateja Sponza, Ando Saabas, Ross Cutler

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[217] arXiv:2110.03103 (cross-list from eess.AS) [pdf, other]: Title: Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming

Authors: Thomas Bernard, François Grondin

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[218] arXiv:2110.03114 (cross-list from eess.AS) [pdf, other]: Title: On audio enhancement via online non-negative matrix factorization

Authors: Andrew Sack, Wenzhao Jiang, Michael Perlmutter, Palina Salanevich, Deanna Needell

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[219] arXiv:2110.03151 (cross-list from eess.AS) [pdf, other]: Title: Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR

Authors: Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

Comments: To appear in ICASSP 2022; System labels (SC and VBx) in Table 1 have been fixed

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[220] arXiv:2110.03299 (cross-list from eess.AS) [pdf, other]: Title: End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks

Authors: Navin Raj Prabhu, Guillaume Carbajal, Nale Lehmann-Willenbrock, Timo Gerkmann

Comments: ACCEPTED to INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[221] arXiv:2110.03329 (cross-list from eess.AS) [pdf, other]: Title: Towards Universal Neural Vocoding with a Multi-band Excited WaveNet

Authors: Axel Roebel, Frederik Bous

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[222] arXiv:2110.03347 (cross-list from eess.AS) [pdf, ps, other]: Title: Cloning one's voice using very limited data in the wild

Authors: Dongyang Dai, Yuanzhe Chen, Li Chen, Ming Tu, Lu Liu, Rui Xia, Qiao Tian, Yuping Wang, Yuxuan Wang

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[223] arXiv:2110.03511 (cross-list from eess.AS) [pdf, other]: Title: Peer Collaborative Learning for Polyphonic Sound Event Detection

Authors: Hayato Endo, Hiromitsu Nishizaki

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[224] arXiv:2110.03630 (cross-list from eess.AS) [pdf, other]: Title: Towards Faster Continuous Multi-Channel HRTF Measurements Based on Learning System Models

Authors: Tobias Kabzinski, Peter Jax

Comments: 5 pages, 4 figures, minor changes compared to v1 after reviewers' feedbacks, accepted at ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[225] arXiv:2110.03691 (cross-list from eess.SP) [pdf, other]: Title: Direct design of biquad filter cascades with deep learning by sampling random polynomials

Authors: Joseph T. Colonel, Christian J. Steinmetz, Marcus Michelen, Joshua D. Reiss

Comments: Accepted to ICASSP 2022

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[226] arXiv:2110.03715 (cross-list from eess.AS) [pdf, other]: Title: PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition

Authors: Boris Bergsma, Minhao Yang, Milos Cernak

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[227] arXiv:2110.03857 (cross-list from eess.AS) [pdf, other]: Title: A study on the efficacy of model pre-training in developing neural text-to-speech system

Authors: Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[228] arXiv:2110.03887 (cross-list from eess.AS) [pdf, other]: Title: Environment Aware Text-to-Speech Synthesis

Authors: Daxin Tan, Guangyan Zhang, Tan Lee

Comments: Accepted by Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[229] arXiv:2110.03894 (cross-list from eess.AS) [pdf, other]: Title: Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition

Authors: Hao Yen, Pin-Jui Ku, Chao-Han Huck Yang, Hu Hu, Sabato Marco Siniscalchi, Pin-Yu Chen, Yu Tsao

Comments: Accepted to Interspeech 2023. Code is available at: this https URL Selected as Best Student Paper Candidate

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
[230] arXiv:2110.03965 (cross-list from eess.AS) [pdf, other]: Title: Joint Scattering for Automatic Chick Call Recognition

Authors: Changhong Wang, Emmanouil Benetos, Shuge Wang, Elisabetta Versace

Comments: 5 pages, submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[231] arXiv:2110.04005 (cross-list from eess.AS) [pdf, other]: Title: KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms

Authors: Chien-Feng Liao, Jen-Yu Liu, Yi-Hsuan Yang

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[232] arXiv:2110.04047 (cross-list from eess.AS) [pdf, other]: Title: TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

Authors: Ali Aroudi, Stefan Uhlich, Marc Ferras Font

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[233] arXiv:2110.04056 (cross-list from eess.AS) [pdf, ps, other]: Title: Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask

Authors: Shaoshi Ling, Chen Shen, Meng Cai, Zejun Ma

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[234] arXiv:2110.04082 (cross-list from eess.AS) [pdf, other]: Title: A Method for Capturing and Reproducing Directional Reverberation in Six Degrees of Freedom

Authors: Benoit Alary, Vesa Välimäki

Comments: This work has been accepted for the I3DA 2021 International Conference and will be submitted to IEEE Xplore Digital Library for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[235] arXiv:2110.04153 (cross-list from eess.AS) [pdf, other]: Title: Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Authors: Pengfei Wu, Junjie Pan, Chenchang Xu, Junhui Zhang, Lin Wu, Xiang Yin, Zejun Ma

Comments: Submitted to ICASSP 2022, 5 pages,2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[236] arXiv:2110.04187 (cross-list from eess.AS) [pdf, other]: Title: SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition

Authors: Li Fu, Xiaoxiao Li, Runyu Wang, Lu Fan, Zhengchen Zhang, Meng Chen, Youzheng Wu, Xiaodong He

Comments: INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[237] arXiv:2110.04265 (cross-list from eess.AS) [pdf, other]: Title: A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

Authors: Ge Zhu, Frank Cwitkowitz, Zhiyao Duan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[238] arXiv:2110.04289 (cross-list from eess.AS) [pdf, other]: Title: Location-based training for multi-channel talker-independent speaker separation

Authors: Hassan Taherian, Ke Tan, DeLiang Wang

Comments: submitted to ICASSP 22

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[239] arXiv:2110.04331 (cross-list from eess.AS) [pdf, ps, other]: Title: MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection

Authors: Chandan K.A. Reddy, Vishak Gopa, Harishchandra Dubey, Sergiy Matusevych, Ross Cutler, Robert Aichner

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[240] arXiv:2110.04378 (cross-list from eess.AS) [pdf, other]: Title: Performance optimizations on deep noise suppression models

Authors: Jerry Chee, Sebastian Braun, Vishak Gopal, Ross Cutler

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[241] arXiv:2110.04385 (cross-list from eess.AS) [pdf, other]: Title: Individualized Hear-through For Acoustic Transparency Using PCA-Based Sound Pressure Estimation At The Eardrum

Authors: Wenyu Jin, Tim Schoof, Henning Schepker

Comments: 5 pages, 5 figures, accepted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[242] arXiv:2110.04391 (cross-list from eess.AS) [pdf, other]: Title: Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement

Authors: Xavier Gitiaux, Aditya Khant, Ebrahim Beyrami, Chandan Reddy, Jayant Gupchup, Ross Cutler

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[243] arXiv:2110.04410 (cross-list from eess.AS) [pdf, other]: Title: TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context

Authors: Nithin Rao Koluguri, Taejin Park, Boris Ginsburg

Comments: preprint. Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[244] arXiv:2110.04440 (cross-list from eess.AS) [pdf, other]: Title: Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

Authors: Yashish M. Siriwardena, Chris Kitchen, Deanna L. Kelly, Carol Espy-Wilson

Comments: 5 pages. arXiv admin note: text overlap with arXiv:2102.07054

Journal-ref: Proceedings of the 2021 International Conference on Multimodal Interaction

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[245] arXiv:2110.04482 (cross-list from eess.AS) [pdf, other]: Title: Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis

Authors: Mu Yang, Shaojin Ding, Tianlong Chen, Tong Wang, Zhangyang Wang

Comments: Accepted to ICASSP 2022. Camera-ready

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[246] arXiv:2110.04484 (cross-list from eess.AS) [pdf, other]: Title: Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

Authors: Han Zhu, Li Wang, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Comments: Accepted by Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[247] arXiv:2110.04511 (cross-list from eess.AS) [pdf, other]: Title: Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition

Authors: Si-Ioi Ng, Tan Lee

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[248] arXiv:2110.04584 (cross-list from eess.AS) [pdf, other]: Title: Visually Exploring Multi-Purpose Audio Data

Authors: David Heise, Helen L. Bear

Comments: Presented at MMSP 2021

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[249] arXiv:2110.04585 (cross-list from eess.AS) [pdf, other]: Title: An evaluation of data augmentation methods for sound scene geotagging

Authors: Helen L. Bear, Veronica Morfi, Emmanouil Benetos

Comments: Presented at Interspeech 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[250] arXiv:2110.04612 (cross-list from eess.AS) [pdf, other]: Title: Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

Authors: Jimmy Tobin, Katrin Tomanek

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[251] arXiv:2110.04654 (cross-list from eess.AS) [pdf, other]: Title: Complex Network-Based Approach for Feature Extraction and Classification of Musical Genres

Authors: Matheus Henrique Pimenta-Zanon, Glaucia Maria Bressan, Fabrício Martins Lopes

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[252] arXiv:2110.04692 (cross-list from eess.AS) [pdf, other]: Title: Poformer: A simple pooling transformer for speaker verification

Authors: Yufeng Ma, Yiwei Ding, Miao Zhao, Yu Zheng, Min Liu, Minqiang Xu

Comments: submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[253] arXiv:2110.04694 (cross-list from eess.AS) [pdf, other]: Title: Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Authors: Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

Comments: Accepted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[254] arXiv:2110.04775 (cross-list from eess.AS) [pdf, other]: Title: Estimating the confidence of speech spoofing countermeasure

Authors: Xin Wang, Junichi Yamagishi

Comments: Work in progress. Comments are welcome. Accepted by ICASSP2022. Code is available this https URL Not all the comments from anonymous reviewers can be addressed within 4 pages, apologize for that

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[255] arXiv:2110.04791 (cross-list from eess.AS) [pdf, other]: Title: Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain

Authors: Zengwei Yao, Wenjie Pei, Fanglin Chen, Guangming Lu, David Zhang

Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[256] arXiv:2110.04850 (cross-list from eess.AS) [pdf, other]: Title: Direct source and early reflections localization using deep deconvolution network under reverberant environment

Authors: Shan Gao, Xihong Wu, Tianshu Qu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[257] arXiv:2110.04908 (cross-list from eess.AS) [pdf, other]: Title: DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation

Authors: Suraj Kothawade, Anmol Mekala, Chandra Sekhara D, Mayank Kothyari, Rishabh Iyer, Ganesh Ramakrishnan, Preethi Jyothi

Comments: ACL 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[258] arXiv:2110.04948 (cross-list from eess.AS) [pdf, other]: Title: Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

Authors: Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Comments: Submitted to ICASSP2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[259] arXiv:2110.05036 (cross-list from eess.AS) [pdf, other]: Title: Multi-View Self-Attention Based Transformer for Speaker Recognition

Authors: Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

Comments: Paper to appear at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[260] arXiv:2110.05249 (cross-list from eess.AS) [pdf, other]: Title: A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Authors: Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Comments: Accepted to ASRU2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[261] arXiv:2110.05267 (cross-list from eess.AS) [pdf, other]: Title: Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

Authors: Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Comments: 5 pages, 7 figures, Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[262] arXiv:2110.05431 (cross-list from eess.AS) [pdf, other]: Title: On the invertibility of a voice privacy system using embedding alignement

Authors: Pierre Champion (MULTISPEECH, LIUM), Thomas Thebaud (LIUM), Gaël Le Lan, Anthony Larcher (LIUM), Denis Jouvet (MULTISPEECH)

Journal-ref: ASRU 2021 - IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2021, Cartagena, Colombia

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[263] arXiv:2110.05632 (cross-list from stat.AP) [pdf, other]: Title: Wind-robust sound event detection and denoising for bioacoustics

Authors: Julius Juodakis, Stephen Marsland

Comments: 34 pages, 5 figures, 2 supplementary figures

Subjects: Applications (stat.AP); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[264] arXiv:2110.05695 (cross-list from eess.AS) [pdf, ps, other]: Title: The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

Authors: Yashish M. Siriwardena, Guilhem Marion, Shihab Shamma

Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[265] arXiv:2110.05745 (cross-list from eess.AS) [pdf, other]: Title: VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Authors: Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Comments: 5 pages, 1 figure, 3 tables, submitted to ICASSP 2022; updated reference information of [33]

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[266] arXiv:2110.05948 (cross-list from eess.SP) [pdf, other]: Title: Denoising Diffusion Gamma Models

Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

Comments: arXiv admin note: substantial text overlap with arXiv:2106.07582

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[267] arXiv:2110.05994 (cross-list from eess.AS) [pdf, other]: Title: Word Order Does Not Matter For Speech Recognition

Authors: Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[268] arXiv:2110.06126 (cross-list from eess.AS) [pdf, other]: Title: Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

Authors: Ricardo Falcon-Perez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

Comments: 5 pages, 2 figures, 4 tables. Submitted to the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[269] arXiv:2110.06304 (cross-list from eess.AS) [pdf, other]: Title: Generalized Time Domain Velocity Vector

Authors: Srđan Kitić, Jérôme Daniel

Comments: Submitted

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[270] arXiv:2110.06306 (cross-list from eess.AS) [pdf, other]: Title: Fine-grained style control in Transformer-based Text-to-speech Synthesis

Authors: Li-Wei Chen, Alexander Rudnicky

Comments: Accepted in ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[271] arXiv:2110.06309 (cross-list from eess.AS) [pdf, other]: Title: Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Authors: Li-Wei Chen, Alexander Rudnicky

Comments: Accepted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[272] arXiv:2110.06428 (cross-list from eess.AS) [pdf, other]: Title: All-neural beamformer for continuous speech separation

Authors: Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Comments: 5 pages, 3 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[273] arXiv:2110.06434 (cross-list from eess.AS) [pdf, other]: Title: DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

Authors: Sergey Nikonorov, Berrak Sisman, Mingyang Zhang, Haizhou Li

Comments: Accepted to ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[274] arXiv:2110.06440 (cross-list from eess.AS) [pdf, other]: Title: SDR -- Medium Rare with Fast Computations

Authors: Robin Scheibler

Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[275] arXiv:2110.06546 (cross-list from eess.AS) [pdf, other]: Title: A Melody-Unsupervision Model for Singing Voice Synthesis

Authors: Soonbeom Choi, Juhan Nam

Comments: ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[276] arXiv:2110.06691 (cross-list from eess.AS) [pdf, other]: Title: Diverse Audio Captioning via Adversarial Training

Authors: Xinhao Mei, Xubo Liu, Jianyuan Sun, Mark D. Plumbley, Wenwu Wang

Comments: 5 pages, 1 figure, accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[277] arXiv:2110.07116 (cross-list from eess.AS) [pdf, other]: Title: Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization

Authors: Yechan Yu, Dongkeon Park, Hong Kook Kim

Comments: Submitted to ICASSP 2022, equal contribution from first two authors

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[278] arXiv:2110.07124 (cross-list from eess.AS) [pdf, other]: Title: Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

Authors: Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Naoya Takahashi, Emiru Tsunoo, Yuki Mitsufuji

Comments: 5 pages, 3 figures, accepted for publication in IEEE ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[279] arXiv:2110.07192 (cross-list from eess.AS) [pdf, other]: Title: Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech

Authors: Haoyue Zhan, Xinyuan Yu, Haitong Zhang, Yang Zhang, Yue Lin

Comments: Accepted by Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[280] arXiv:2110.07205 (cross-list from eess.AS) [pdf, other]: Title: SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

Authors: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Comments: Accepted by ACL 2022 main conference

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[281] arXiv:2110.07216 (cross-list from eess.AS) [pdf, other]: Title: FedSpeech: Federated Text-to-Speech with Continual Learning

Authors: Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

Comments: Accepted by IJCAI 2021

Journal-ref: 2021. Main Track. Pages 3829-3835

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[282] arXiv:2110.07419 (cross-list from eess.AS) [pdf, other]: Title: Student-t Networks for Melody Estimation

Authors: Udhav Gupta, Avi, Bhavesh Jain

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[283] arXiv:2110.07468 (cross-list from eess.AS) [pdf, other]: Title: SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation

Authors: Rongjie Huang, Chenye Cui, Feiyang Chen, Yi Ren, Jinglin Liu, Zhou Zhao, Baoxing Huai, Zhefeng Wang

Comments: Accepted by ACM Multimedia 2022

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[284] arXiv:2110.07537 (cross-list from eess.AS) [pdf, ps, other]: Title: Toward Degradation-Robust Voice Conversion

Authors: Chien-yu Huang, Kai-Wei Chang, Hung-yi Lee

Comments: To appear in the proceedings of ICASSP 2022, equal contribution from first two authors

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[285] arXiv:2110.07957 (cross-list from eess.AS) [pdf, other]: Title: Don't speak too fast: The impact of data bias on self-supervised speech models

Authors: Yen Meng, Yi-Hui Chou, Andy T. Liu, Hung-yi Lee

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[286] arXiv:2110.08243 (cross-list from eess.AS) [pdf, other]: Title: Neural Dubber: Dubbing for Videos According to Scripts

Authors: Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao

Comments: Accepted by NeurIPS 2021; Project page at this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[287] arXiv:2110.08545 (cross-list from eess.AS) [pdf, other]: Title: A Unified Speaker Adaptation Approach for ASR

Authors: Yingzhu Zhao, Chongjia Ni, Cheung-Chi Leung, Shafiq Joty, Eng Siong Chng, Bin Ma

Comments: Accepted by EMNLP 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[288] arXiv:2110.08583 (cross-list from eess.AS) [pdf, ps, other]: Title: ASR4REAL: An extended benchmark for speech models

Authors: Morgane Riviere, Jade Copet, Gabriel Synnaeve

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[289] arXiv:2110.08598 (cross-list from eess.AS) [pdf, other]: Title: A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

Authors: Hu Hu, Sabato Marco Siniscalchi, Chao-Han Huck Yang, Chin-Hui Lee

Comments: Accepted to ICASSP 2022. Code is available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
[290] arXiv:2110.08813 (cross-list from eess.AS) [pdf, other]: Title: VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

Authors: Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

Comments: 5 pages, ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[291] arXiv:2110.08862 (cross-list from eess.AS) [pdf, other]: Title: Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features

Authors: Wei-Han Hsu, Bo-Yu Chen, Yi-Hsuan Yang

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[292] arXiv:2110.09000 (cross-list from eess.AS) [pdf, other]: Title: Supervised Metric Learning for Music Structure Features

Authors: Ju-Chiang Wang, Jordan B. L. Smith, Wei-Tsung Lu, Xuchen Song

Comments: This paper was accepted and presented at ISMIR 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[293] arXiv:2110.09019 (cross-list from eess.AS) [pdf, ps, other]: Title: Similarity-and-Independence-Aware Beamformer with Iterative Casting and Boost Start for Target Source Extraction Using Reference

Authors: Atsuo Hiroe

Comments: Accepted for publication as a regular paper in the IEEE Open Journal of Signal Processing (2021)

Journal-ref: A. Hiroe, "Similarity-and-Independence-Aware Beam-former with Iterative Casting and Boost Start for Target Source Extraction Using Reference," in IEEE Open Journal of Signal Processing, 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[294] arXiv:2110.09150 (cross-list from eess.AS) [pdf, other]: Title: Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information

Authors: Jenthe Thienpondt, Brecht Desplanques, Kris Demuynck

Comments: proceedings of ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[295] arXiv:2110.09625 (cross-list from eess.AS) [pdf, other]: Title: Personalized Speech Enhancement: New Models and Comprehensive Evaluation

Authors: Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[296] arXiv:2110.09890 (cross-list from eess.AS) [pdf, other]: Title: Multi-Modal Pre-Training for Automated Speech Recognition

Authors: David M. Chan, Shalini Ghosh, Debmalya Chakrabarty, Björn Hoffmeister

Comments: Presented at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[297] arXiv:2110.09923 (cross-list from eess.AS) [pdf, ps, other]: Title: Speech Enhancement-assisted Voice Conversion in Noisy Environments

Authors: Yun-Ju Chan, Chiang-Jen Peng, Syu-Siang Wang, Hsin-Min Wang, Yu Tsao, Tai-Shih Chi

Journal-ref: APSIPA 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[298] arXiv:2110.09924 (cross-list from eess.AS) [pdf, ps, other]: Title: Speech Enhancement Based on Cyclegan with Noise-informed Training

Authors: Wen-Yuan Ting, Syu-Siang Wang, Hsin-Li Chang, Borching Su, Yu Tsao

Journal-ref: ISCSLP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[299] arXiv:2110.09928 (cross-list from eess.AS) [pdf, other]: Title: CycleFlow: Purify Information Factors by Cycle Loss

Authors: Haoran Sun, Chen Chen, Lantian Li, Dong Wang

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[300] arXiv:2110.09930 (cross-list from eess.AS) [pdf, other]: Title: Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning

Authors: Yi-Chen Chen, Shu-wen Yang, Cheng-Kuang Lee, Simon See, Hung-yi Lee

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[301] arXiv:2110.09958 (cross-list from eess.AS) [pdf, other]: Title: The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks

Authors: Darius Petermann, Gordon Wichern, Zhong-Qiu Wang, Jonathan Le Roux

Comments: Accepted to ICASSP2022. For resources and examples, see this https URL

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[302] arXiv:2110.10026 (cross-list from eess.AS) [pdf, other]: Title: Private Language Model Adaptation for Speech Recognition

Authors: Zhe Liu, Ke Li, Shreyan Bakshi, Fuchun Peng

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[303] arXiv:2110.10139 (cross-list from eess.AS) [pdf, other]: Title: Chunked Autoregressive GAN for Conditional Waveform Synthesis

Authors: Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

Comments: Published as a conference paper at ICLR 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[304] arXiv:2110.10326 (cross-list from eess.AS) [pdf, other]: Title: Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

Authors: Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li

Comments: Accepted by Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[305] arXiv:2110.10330 (cross-list from eess.AS) [pdf, other]: Title: One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement

Authors: Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[306] arXiv:2110.10812 (cross-list from eess.AS) [pdf, other]: Title: REAL-M: Towards Speech Separation on Real Mixtures

Authors: Cem Subakan, Mirco Ravanelli, Samuele Cornell, François Grondin

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[307] arXiv:2110.11144 (cross-list from eess.AS) [pdf, other]: Title: RCT: Random Consistency Training for Semi-supervised Sound Event Detection

Authors: Nian Shao, Erfan Loweimi, Xiaofei Li

Comments: Preprint for interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[308] arXiv:2110.11438 (cross-list from eess.AS) [pdf, ps, other]: Title: Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence

Authors: Matteo Torcoli, Thorsten Kastner, Jürgen Herre

Journal-ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[309] arXiv:2110.11479 (cross-list from eess.AS) [pdf, other]: Title: Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Authors: Ting-Yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[310] arXiv:2110.12304 (cross-list from eess.AS) [pdf, other]: Title: A Study of Acoustic Features in Arabic Speaker Identification under Noisy Environmental Conditions

Authors: Zhor Benhafid, Kawthar Yasmine Zergat, Abderrahmane Amrouche

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[311] arXiv:2110.12676 (cross-list from eess.AS) [pdf, other]: Title: Controllable and Interpretable Singing Voice Decomposition via Assem-VC

Authors: Kang-wook Kim, Junhyeok Lee

Comments: Accepted to NeurIPS Workshop on ML for Creativity and Design 2021 (Oral)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[312] arXiv:2110.12820 (cross-list from eess.AS) [pdf, ps, other]: Title: On Synchronization of Wireless Acoustic Sensor Networks in the Presence of Time-varying Sampling Rate Offsets and Speaker Changes

Authors: Tobias Gburrek, Joerg Schmalenstroeer, Reinhold Haeb-Umbach

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[313] arXiv:2110.13125 (cross-list from eess.AS) [pdf, ps, other]: Title: Automatic Impact-sounding Acoustic Inspection of Concrete Structure

Authors: Jinglun Feng, Hua Xiao, Ejup Hoxha, Yifeng Song, Liang Yang, Jizhong Xiao

Journal-ref: 10th International Conference on Structural Health Monitoring of Intelligent Infrastructure, SHMII 10, 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[314] arXiv:2110.13586 (cross-list from eess.AS) [pdf, other]: Title: Towards Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning

Authors: Jakob Abeßer, Meinard Müller

Comments: submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[315] arXiv:2110.13653 (cross-list from eess.AS) [pdf, other]: Title: Learning Speaker Representation with Semi-supervised Learning approach for Speaker Profiling

Authors: Shangeth Rajaa, Pham Van Tung, Chng Eng Siong

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[316] arXiv:2110.14139 (cross-list from eess.AS) [pdf, other]: Title: Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions

Authors: Wangyou Zhang, Jing Shi, Chenda Li, Shinji Watanabe, Yanmin Qian

Comments: 5 pages, 3 figures, accepted by IEEE WASPAA 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[317] arXiv:2110.14142 (cross-list from eess.AS) [pdf, other]: Title: Separating Long-Form Speech with Group-Wise Permutation Invariant Training

Authors: Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei

Comments: 5 pages, 3 figures, 3 tables, submitted to IEEE ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[318] arXiv:2110.14838 (cross-list from eess.AS) [pdf, other]: Title: Continuous Speech Separation with Recurrent Selective Attention Network

Authors: Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[319] arXiv:2110.15018 (cross-list from eess.AS) [pdf, other]: Title: TorchAudio: Building Blocks for Audio and Speech Processing

Authors: Yao-Yuan Yang, Moto Hira, Zhaoheng Ni, Anjali Chourdia, Artyom Astafurov, Caroline Chen, Ching-Feng Yeh, Christian Puhrsch, David Pollack, Dmitriy Genzel, Donny Greenberg, Edward Z. Yang, Jason Lian, Jay Mahadeokar, Jeff Hwang, Ji Chen, Peter Goldsborough, Prabhat Roy, Sean Narenthiran, Shinji Watanabe, Soumith Chintala, Vincent Quenneville-Bélair, Yangyang Shi

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[320] arXiv:2110.15581 (cross-list from eess.AS) [pdf, other]: Title: SA-SDR: A novel loss function for separation of meeting style data

Authors: Thilo von Neumann, Keisuke Kinoshita, Christoph Boeddeker, Marc Delcroix, Reinhold Haeb-Umbach

Comments: accepted at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[321] arXiv:2110.15593 (cross-list from q-bio.QM) [pdf, ps, other]: Title: Towards automatic detection and classification of orca (Orcinus orca) calls using cross-correlation methods

Authors: Stefano Palmero, Carlo Guidi, Vladimir Kulikovskiy, Matteo Sanguineti, Michele Manghi, Matteo Sommer, Gaia Pesce

Comments: 26 pages, 6 figures

Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[322] arXiv:2110.15684 (cross-list from eess.AS) [pdf, other]: Title: Fusing ASR Outputs in Joint Training for Speech Emotion Recognition

Authors: Yuanchao Li, Peter Bell, Catherine Lai

Comments: Accepted for ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)

[ total of 324 entries: 1-322 | 323-324 ]
[ showing 322 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Oct 2021