Audio and Speech Processing

Authors and titles for eess.AS in Nov 2021

[ total of 204 entries: 1-204 ]
[ showing 204 entries per page: fewer | more ]

[1] arXiv:2111.00009 [pdf, other]: Title: Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

Authors: Martin Kocour, Kateřina Žmolíková, Lucas Ondel, Ján Švec, Marc Delcroix, Tsubasa Ochiai, Lukáš Burget, Jan Černocký

Comments: submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2] arXiv:2111.00030 [pdf, other]: Title: Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

Authors: Sharath Adavanne, Archontis Politis, Tuomas Virtanen

Comments: Submitted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2021)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[3] arXiv:2111.00127 [pdf, other]: Title: Cross-attention conformer for context modeling in speech enhancement for ASR

Authors: Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, Quan Wang, Yanzhang He

Comments: Will appear in IEEE-ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[4] arXiv:2111.00242 [pdf, ps, other]: Title: Self-Supervised Speech Denoising Using Only Noisy Audio Signals

Authors: Jiasong Wu, Qingchun Li, Guanyu Yang, Lei Li, Lotfi Senhadji, Huazhong Shu

Comments: 11 pages, 4 figures, 6 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[5] arXiv:2111.00316 [pdf, other]: Title: Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Authors: Midia Yousefi, John H.L. Hansen

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[6] arXiv:2111.00320 [pdf, other]: Title: Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition

Authors: Midia Yousefi, John H.L. Hanse

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[7] arXiv:2111.00764 [pdf, other]: Title: SNRi Target Training for Joint Speech Enhancement and Recognition

Authors: Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani

Comments: Submitted to Interspeech 2022 (v1 has been rejected from ICASSP 2022)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2111.01320 [pdf, other]: Title: AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence

Authors: Yun-Ning Hung, Karn N. Watcharasupat, Chih-Wei Wu, Iroro Orife, Kelian Li, Pavan Seshadri, Junyoung Lee

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2111.01326 [pdf, other]: Title: Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

Authors: Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W Black

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[10] arXiv:2111.01652 [pdf, other]: Title: Design and Evaluation of Active Noise Control on Machinery Noise

Authors: Shulin Wen, Duy Hai Nguyen, Miqing Wang, Woon-Seng Gan

Journal-ref: APSIPA 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Systems and Control (eess.SY)
[11] arXiv:2111.01690 [pdf, other]: Title: Recent Advances in End-to-End Automatic Speech Recognition

Authors: Jinyu Li

Comments: Accepted at APSIPA Transactions on Signal and Information Processing

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[12] arXiv:2111.01710 [pdf, other]: Title: Multi-input Architecture and Disentangled Representation Learning for Multi-dimensional Modeling of Music Similarity

Authors: Sebastian Ribecky, Jakob Abeßer, Hanna Lukashevich

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[13] arXiv:2111.01914 [pdf, other]: Title: Reduction of Subjective Listening Effort for TV Broadcast Signals with Recurrent Neural Networks

Authors: Nils L. Westhausen, Rainer Huber, Hannah Baumgartner, Ragini Sinha, Jan Rennies, Bernd T. Meyer

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. This version is the authors' version and may vary from the final publication in details

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14] arXiv:2111.02363 [pdf, other]: Title: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features

Authors: Ryandhimas E. Zezario, Szu-Wei Fu, Fei Chen, Chiou-Shann Fuh, Hsin-Min Wang, Yu Tsao

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[15] arXiv:2111.02392 [pdf, other]: Title: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion

Authors: Benjamin van Niekerk, Marc-André Carbonneau, Julian Zaïdi, Mathew Baas, Hugo Seuté, Herman Kamper

Comments: 5 pages, 2 figures, 2 tables. Accepted at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2111.02674 [pdf, other]: Title: Voice Conversion Can Improve ASR in Very Low-Resource Settings

Authors: Matthew Baas, Herman Kamper

Comments: 5 page, 4 tables, 2 figures. Accepted at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[17] arXiv:2111.03470 [pdf, other]: Title: ParsiNorm: A Persian Toolkit for Speech Processing Normalization

Authors: Romina Oji, Seyedeh Fatemeh Razavi, Sajjad Abdi Dehsorkh, Alireza Hariri, Hadi Asheri, Reshad Hosseini

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[18] arXiv:2111.03482 [pdf, other]: Title: Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification

Authors: Jiri Malek, Jakub Jansky, Zbynek Koldovsky, Tomas Kounovsky, Jaroslav Cmejla, Jindrich Zdansky

Comments: Modified version of the article accepted for publication in IEEE/ACM Transactions on Audio Speech and Language Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusions

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2295-2309, 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19] arXiv:2111.03600 [pdf, other]: Title: Hybrid Spectrogram and Waveform Source Separation

Authors: Alexandre Défossez

Comments: ISMIR 2021 MDX Workshop, 11 pages, 2 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[20] arXiv:2111.03842 [pdf, other]: Title: Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems

Authors: Victoria Mingote, Antonio Miguel, Alfonso Ortega, Eduardo Lleida

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[21] arXiv:2111.03847 [pdf, other]: Title: Deep Noise Suppression Maximizing Non-Differentiable PESQ Mediated by a Non-Intrusive PESQNet

Authors: Ziyi Xu, Maximilian Strake, Tim Fingscheidt

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:2111.04063 [pdf, other]: Title: LiMuSE: Lightweight Multi-modal Speaker Extraction

Authors: Qinghua Liu, Yating Huang, Yunzhe Hao, Jiaming Xu, Bo Xu

Comments: Accepted to IEEE SLT 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[23] arXiv:2111.04312 [pdf, ps, other]: Title: Inter-channel Conv-TasNet for multichannel speech enhancement

Authors: Dongheon Lee, Seongrae Kim, Jung-Woo Choi

Comments: 10 pages, this work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24] arXiv:2111.04433 [pdf, other]: Title: RawBoost: A Raw Data Boosting and Augmentation Method applied to Automatic Speaker Verification Anti-Spoofing

Authors: Hemlata Tak, Madhu Kamble, Jose Patino, Massimiliano Todisco, Nicholas Evans

Comments: Accepted to IEEE ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD); Signal Processing (eess.SP)
[25] arXiv:2111.04614 [pdf, other]: Title: Learning Filterbanks for End-to-End Acoustic Beamforming

Authors: Samuele Cornell, Manuel Pariente, François Grondin, Stefano Squartini

Comments: accepted at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[26] arXiv:2111.04637 [pdf, other]: Title: A hemispheric two-channel code accounts for binaural unmasking in humans

Authors: Jörg Encke, Mathias Dietz

Journal-ref: Commun Biol 5, 1122 (2022)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27] arXiv:2111.04730 [pdf, other]: Title: Emotional Prosody Control for Speech Generation

Authors: Sarath Sivaprasad, Saiteja Kosgi, Vineet Gandhi

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[28] arXiv:2111.04904 [pdf, other]: Title: Joint Neural AEC and Beamforming with Double-Talk Detection

Authors: Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

Comments: Accepted in Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[29] arXiv:2111.05691 [pdf, other]: Title: HASA-net: A non-intrusive hearing-aid speech assessment network

Authors: Hsin-Tien Chiang, Yi-Chiao Wu, Cheng Yu, Tomoki Toda, Hsin-Min Wang, Yih-Chun Hu, Yu Tsao

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[30] arXiv:2111.05703 [pdf, other]: Title: OSSEM: one-shot speaker adaptive speech enhancement using meta learning

Authors: Cheng Yu, Szu-Wei Fu, Tsun-An Hsieh, Yu Tsao, Mirco Ravanelli

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[31] arXiv:2111.06015 [pdf, other]: Title: Uformer: A Unet based dilated complex & real dual-path conformer network for simultaneous speech enhancement and dereverberation

Authors: Yihui Fu, Yun Liu, Jingdong Li, Dawei Luo, Shubo Lv, Yukai Jv, Lei Xie

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[32] arXiv:2111.06458 [pdf, other]: Title: MultiSV: Dataset for Far-Field Multi-Channel Speaker Verification

Authors: Ladislav Mošner, Oldřich Plchot, Lukáš Burget, Jan Černocký

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[33] arXiv:2111.06539 [pdf, other]: Title: Disentangling Physical Parameters for Anomalous Sound Detection Under Domain Shifts

Authors: Kota Dohi, Takashi Endo, Yohei Kawaguchi

Comments: 4 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[34] arXiv:2111.06601 [pdf, other]: Title: AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion

Authors: Damien Ronssin, Milos Cernak

Comments: ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[35] arXiv:2111.06671 [pdf, ps, other]: Title: HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE

Authors: Rohan Kumar Das, Ruijie Tao, Haizhou Li

Comments: 3 pages

Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[36] arXiv:2111.07218 [pdf, other]: Title: Meta-Voice: Fast few-shot style transfer for expressive voice cloning using meta learning

Authors: Songxiang Liu, Dan Su, Dong Yu

Comments: Pre-print technical report, 6 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[37] arXiv:2111.07578 [pdf, other]: Title: Monaural source separation: From anechoic to reverberant environments

Authors: Tobias Cord-Landwehr, Christoph Boeddeker, Thilo von Neumann, Catalin Zorila, Rama Doddipatla, Reinhold Haeb-Umbach

Comments: Submitted to IWAENC 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[38] arXiv:2111.07725 [pdf, other]: Title: Investigating self-supervised front ends for speech spoofing countermeasures

Authors: Xin Wang, Junichi Yamagishi

Comments: V3: added sub-band analysis, submitted to ISCA Odyssey2022; V2: added min tDCF results on 2019 and 2021 LA. EERs on LA 2021 were slightly updated to fix one glitch in the score file. EERs and min tDCFs on 2021 LA and DF can be computed using the latest official code this https URL Work in progress. Feedback is welcome!

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[39] arXiv:2111.07759 [pdf, other]: Title: Joint Far- and Near-End Speech Intelligibility Enhancement based on the Approximated Speech Intelligibility Index

Authors: Andreas Jonas Fuglsig, Jan Østergaard, Jesper Jensen, Lars Søndergaard Bertelsen, Peter Mariager, Zheng-Hua Tan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40] arXiv:2111.08112 [pdf, other]: Title: Biologically inspired speech emotion recognition

Authors: Reza Lotfidereshgi, Philippe Gournay

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[41] arXiv:2111.08116 [pdf, other]: Title: Speech Prediction using an Adaptive Recurrent Neural Network with Application to Packet Loss Concealment

Authors: Reza Lotfidereshgi, Philippe Gournay

Subjects: Audio and Speech Processing (eess.AS)
[42] arXiv:2111.08192 [pdf, other]: Title: SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays

Authors: Thi Ngoc Tho Nguyen, Douglas L. Jones, Karn N. Watcharasupat, Huy Phan, Woon-Seng Gan

Comments: arXiv admin note: text overlap with arXiv:2110.00275

Journal-ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 716-720

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43] arXiv:2111.08201 [pdf, other]: Title: Attention-based Multi-hypothesis Fusion for Speech Summarization

Authors: Takatomo Kano, Atsunori Ogawa, Marc Delcroix, Shinji Watanabe

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[44] arXiv:2111.08387 [pdf, other]: Title: S-DCCRN: Super Wide Band DCCRN with learnable complex feature for speech enhancement

Authors: Shubo Lv, Yihui Fu, Mengtao Xing, Jiayao Sun, Lei Xie, Jun Huang, Yannan Wang, Tao Yu

Comments: Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[45] arXiv:2111.08635 [pdf, other]: Title: Single-channel speech separation using Soft-minimum Permutation Invariant Training

Authors: Midia Yousefi, John H.L. Hansen

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[46] arXiv:2111.08678 [pdf, ps, other]: Title: Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Authors: Viet Anh Trinh (1), Sebastian Braun (2) ((1) CUNY Graduate Center, (2) Microsoft Research)

Comments: To appear in Proceeding of ICASSP 2022, May 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47] arXiv:2111.09372 [pdf, other]: Title: BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement

Authors: Sunwoo Kim, Minje Kim

Comments: 5 pages, 3 figures, ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[48] arXiv:2111.09935 [pdf, other]: Title: A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

Authors: Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard

Comments: Will appear in IEEE-ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[49] arXiv:2111.09983 [pdf, other]: Title: Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

Authors: Chunxi Liu, Michael Picheny, Leda Sarı, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

Comments: Submitted to ICASSP 2022. Our dataset will be publicly available at (this https URL) for general use. We also would like to note that considering the limitations of our dataset, we limit the use of it for only evaluation purposes (see license agreement)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50] arXiv:2111.10043 [pdf, other]: Title: A comparison of streaming models and data augmentation methods for robust speech recognition

Authors: Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

Comments: Accepted as a conference paper at ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[51] arXiv:2111.10047 [pdf, other]: Title: Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages

Authors: Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

Comments: Accepted as a conference paper at ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[52] arXiv:2111.10202 [pdf, other]: Title: Multimodal Emotion Recognition with High-level Speech and Text Features

Authors: Mariana Rodrigues Makiuchi, Kuniaki Uto, Koichi Shinoda

Comments: Accepted at ASRU 2021. Code available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[53] arXiv:2111.10207 [pdf, ps, other]: Title: Comparative Study of Speech Analysis Methods to Predict Parkinson's Disease

Authors: Adedolapo Aishat Toye, Suryaprakash Kompalli

Comments: Machine Learning for Health (ML4H) - Extended Abstract

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
[54] arXiv:2111.10208 [pdf, other]: Title: Attention based end to end Speech Recognition for Voice Search in Hindi and English

Authors: Raviraj Joshi, Venkateshan Kannan

Comments: Accepted at Forum for Information Retrieval Evaluation (FIRE) 2021

Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD)
[55] arXiv:2111.10574 [pdf, ps, other]: Title: Switching Independent Vector Analysis and Its Extension to Blind and Spatially Guided Convolutional Beamforming Algorithms

Authors: Tomohiro Nakatani, Rintaro Ikeshita, Keisuke Kinoshita, Hiroshi Sawada, Naoyuki Kamo, Shoko Araki

Comments: Submitted to IEEE/ACM Trans. Audio, Speech, and Language Processing on 27 July 2021, accepted on 22 Feb. 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[56] arXiv:2111.10891 [pdf, other]: Title: Active Restoration of Lost Audio Signals Using Machine Learning and Latent Information

Authors: Zohra Adila Cheddad, Abbas Cheddad

Comments: 18 Pages, 2 Tables, 8 Figures

Journal-ref: Lecture Notes in Networks and Systems, vol 822, 2024, Springer, Cham

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[57] arXiv:2111.11045 [pdf, other]: Title: Sound Field Reproduction With Weighted Mode Matching and Infinite-Dimensional Harmonic Analysis: An Experimental Evaluation

Authors: Shoichi Koyama, Keisuke Kimura, Natsuki Ueno

Comments: Accepted to International Conference on Immersive and 3D Audio (I3DA) 2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58] arXiv:2111.11606 [pdf, other]: Title: Effect of noise suppression losses on speech distortion and ASR performance

Authors: Sebastian Braun, Hannes Gamper

Comments: submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59] arXiv:2111.11831 [pdf, other]: Title: SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Authors: Zhao You, Shulin Feng, Dan Su, Dong Yu

Comments: 5 pages, 1 figure. Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[60] arXiv:2111.11882 [pdf, other]: Title: Dataset of Spatial Room Impulse Responses in a Variable Acoustics Room for Six Degrees-of-Freedom Rendering and Analysis

Authors: Thomas McKenzie, Leo McCormack, Christoph Hold

Comments: 3 pages, 3 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[61] arXiv:2111.12203 [pdf, other]: Title: KUIELab-MDX-Net: A Two-Stream Neural Network for Music Demixing

Authors: Minseok Kim, Woosung Choi, Jaehwa Chung, Daewon Lee, Soonyoung Jung

Comments: MDX Workshop @ ISMIR 2021, 7 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62] arXiv:2111.12277 [pdf, other]: Title: One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

Authors: Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

Comments: Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[63] arXiv:2111.12516 [pdf, other]: Title: LightSAFT: Lightweight Latent Source Aware Frequency Transform for Source Separation

Authors: Yeong-Seok Jeong, Jinsung Kim, Woosung Choi, Jaehwa Chung, Soonyoung Jung

Comments: MDX Workshop @ ISMIR 2021, 6 pages, 1 figure

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[64] arXiv:2111.13321 [pdf, other]: Title: Learning source-aware representations of music in a discrete latent space

Authors: Jinsung Kim, Yeong-Seok Jeong, Woosung Choi, Jaehwa Chung, Soonyoung Jung

Comments: MDX Workshop @ ISMIR 2021, 7 pages, 2 figure

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[65] arXiv:2111.13803 [pdf, other]: Title: Low-Latency Online Speaker Diarization with Graph-Based Label Generation

Authors: Yucong Zhang, Qinjian Lin, Weiqing Wang, Lin Yang, Xuyang Wang, Junjie Wang, Ming Li

Comments: accepted by Odyssey 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[66] arXiv:2111.14200 [pdf, ps, other]: Title: Transfer Learning with Jukebox for Music Source Separation

Authors: W. Zai El Amri, O. Tautz, H. Ritter, A. Melnik

Comments: Conference paper (AIAI 2022), 4 pages, 2 figures, 2 tables

Journal-ref: Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT,volume 647) 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[67] arXiv:2111.14842 [pdf, other]: Title: Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

Authors: Lasse Borgholt, Jakob Drachmann Havtorn, Mostafa Abdou, Joakim Edin, Lars Maaløe, Anders Søgaard, Christian Igel

Comments: Under review as a conference paper at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[68] arXiv:2111.01593 (cross-list from eess.SP) [pdf, other]: Title: Design of Tight Minimum-Sidelobe Windows by Riemannian Newton's Method

Authors: Daichi Kitahara, Kohei Yatabe

Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS); Numerical Analysis (math.NA); Optimization and Control (math.OC)
[69] arXiv:2111.08503 (cross-list from eess.SP) [pdf, other]: Title: Binary classification of spoken words with passive phononic metamaterials

Authors: Tena Dubček, Daniel Moreno-Garcia, Thomas Haag, Parisa Omidvar, Henrik R. Thomsen, Theodor S. Becker, Lars Gebraad, Christoph Bärlocher, Fredrik Andersson, Sebastian D. Huber, Dirk-Jan van Manen, Luis Guillermo Villanueva, Johan O.A. Robertsson, Marc Serra-Garcia

Comments: 13 pages, 11 figures

Subjects: Signal Processing (eess.SP); Disordered Systems and Neural Networks (cond-mat.dis-nn); Emerging Technologies (cs.ET); Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[70] arXiv:2111.00161 (cross-list from cs.CL) [pdf, other]: Title: Pseudo-Labeling for Massively Multilingual Speech Recognition

Authors: Loren Lugosch, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

Comments: Accepted to ICASSP 2022. New version has links to code/models + more training curves for larger model. (Fixed code link.)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71] arXiv:2111.00195 (cross-list from cs.SD) [pdf, other]: Title: Learning Continuous Representation of Audio for Arbitrary Scale Super Resolution

Authors: Jaechang Kim, Yunjoo Lee, Seunghoon Hong, Jungseul Ok

Comments: Accepted by ICASSP 2022. The source code is available at this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72] arXiv:2111.00400 (cross-list from cs.CL) [pdf, other]: Title: FANS: Fusing ASR and NLU for on-device SLU

Authors: Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

Comments: Published in Interspeech 2021

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2111.00404 (cross-list from cs.SD) [pdf, other]: Title: Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

Authors: Aneesh Muppidi, Martin Radfar

Comments: Published in ICASSP 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[74] arXiv:2111.00436 (cross-list from cs.SD) [pdf, ps, other]: Title: Analysis of North Indian Classical Ragas Using Tonnetz

Authors: Ananya Giri

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:2111.00610 (cross-list from cs.CL) [pdf, other]: Title: Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units

Authors: Anurag Katakkar, Alan W Black

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[76] arXiv:2111.00704 (cross-list from cs.SD) [pdf, other]: Title: A Novel 1D State Space for Efficient Music Rhythmic Analysis

Authors: Mojtaba Heydari, Matthew McCallum, Andreas Ehmann, Zhiyao Duan

Comments: International Conference on Acoustics, Speech and Signal Processing (ICASSP), May. 2022

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[77] arXiv:2111.00868 (cross-list from cs.SD) [pdf, other]: Title: A mathematical model of the vowel space

Authors: Frédéric Berthommier (GIPSA-PCMD)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Classical Physics (physics.class-ph); Populations and Evolution (q-bio.PE)
[78] arXiv:2111.00962 (cross-list from cs.SD) [pdf, other]: Title: RefineGAN: Universally Generating Waveform Better than Ground Truth with Highly Accurate Pitch and Intensity Responses

Authors: Shengyuan Xu, Wenxiao Zhao, Jing Guo

Comments: Submitted to INTERSPEECH2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[79] arXiv:2111.00976 (cross-list from cs.CL) [pdf, other]: Title: A transfer learning based approach for pronunciation scoring

Authors: Marcelo Sancinetti, Jazmin Vidal, Cyntia Bonomi, Luciana Ferrer

Comments: ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2111.01024 (cross-list from cs.CV) [pdf, other]: Title: With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

Authors: Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen

Comments: Accepted at BMVC 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2111.01205 (cross-list from cs.SD) [pdf, other]: Title: Evaluating robustness of You Only Hear Once(YOHO) Algorithm on noisy audios in the VOICe Dataset

Authors: Soham Tiwari, Kshitiz Lakhotia, Manjunath Mulimani

Comments: 7 pages, 1 figure, 3 tables, Efficient Natural Language and Speech Processing Workshop, NeurIPS 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[82] arXiv:2111.01216 (cross-list from cs.SD) [pdf, other]: Title: Learning To Generate Piano Music With Sustain Pedals

Authors: Joann Ching, Yi-Hsuan Yang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[83] arXiv:2111.01272 (cross-list from cs.CL) [pdf, other]: Title: Sequence Transduction with Graph-based Supervision

Authors: Niko Moritz, Takaaki Hori, Shinji Watanabe, Jonathan Le Roux

Comments: Accepted for publication at IEEE ICASSP 2022

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2111.01342 (cross-list from cs.SD) [pdf, other]: Title: Attention-Guided Generative Adversarial Network for Whisper to Normal Speech Conversion

Authors: Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[85] arXiv:2111.01430 (cross-list from cs.SD) [pdf, other]: Title: CycleGAN with Dual Adversarial Loss for Bone-Conducted Speech Enhancement

Authors: Qing Pan, Teng Gao, Jian Zhou, Huabin Wang, Liang Tao, Hon Keung Kwan

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2111.02006 (cross-list from cs.SD) [pdf, other]: Title: A Strongly-Labelled Polyphonic Dataset of Urban Sounds with Spatiotemporal Context

Authors: Kenneth Ooi, Karn N. Watcharasupat, Santi Peksi, Furi Andi Karnapi, Zhen-Ting Ong, Danny Chua, Hui-Wen Leow, Li-Long Kwok, Xin-Lei Ng, Zhen-Ann Loh, Woon-Seng Gan

Comments: 7 pages, 8 figures, 3 tables. To be published in Proceedings of the 13th Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

Journal-ref: Proceedings of the 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2021, pp. 982-988

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2111.02041 (cross-list from cs.SD) [pdf, other]: Title: A Comparative Study of Speaker Role Identification in Air Traffic Communication Using Deep Learning Approaches

Authors: Dongyue Guo, Jianwei Zhang, Bo Yang, Yi Lin

Comments: This work has been submitted to the ACM TALLIP for possible publication

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[88] arXiv:2111.02216 (cross-list from cs.CL) [pdf, other]: Title: Automatic Embedding of Stories Into Collections of Independent Media

Authors: Dylan R. Ashley, Vincent Herrmann, Zachary Friggstad, Kory W. Mathewson, Jürgen Schmidhuber

Comments: 2 pages in main text + 1 page of references + 6 pages of appendices, 2 figures in main text + 3 figures in appendices, 1 algorithm in appendices; source code available at this https URL

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2111.02298 (cross-list from cs.SD) [pdf, other]: Title: STC speaker recognition systems for the NIST SRE 2021

Authors: Anastasia Avdeeva, Aleksei Gusev, Igor Korsunov, Alexander Kozlov, Galina Lavrentyeva, Sergey Novoselov, Timur Pekhovsky, Andrey Shulipa, Alisa Vinogradova, Vladimir Volokhov, Evgeny Smirnov, Vasily Galyuk

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[90] arXiv:2111.02351 (cross-list from cs.SD) [pdf, other]: Title: Weight, Block or Unit? Exploring Sparsity Tradeoffs for Speech Enhancement on Tiny Neural Accelerators

Authors: Marko Stamenovic, Nils L. Westhausen, Li-Chia Yang, Carl Jensen, Alex Pawlicki

Comments: To appear in NeurIPS 2021 Efficient Natural Langauge and Speech Processing Workshop as oral-spotlight presentation

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[91] arXiv:2111.02585 (cross-list from cs.SD) [pdf, other]: Title: InQSS: a speech intelligibility and quality assessment model using a multi-task learning network

Authors: Yu-Wen Chen, Yu Tsao

Comments: accepted by Insterspeech 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[92] arXiv:2111.02654 (cross-list from cs.SD) [pdf, other]: Title: Speech recognition for air traffic control via feature learning and end-to-end training

Authors: Peng Fan, Dongyue Guo, Yi Lin, Bo Yang, Jianwei Zhang

Comments: Submitted to IEEE ICASSP 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[93] arXiv:2111.02735 (cross-list from cs.CL) [pdf, other]: Title: A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

Authors: Yingzhi Wang, Abdelmoumene Boumadane, Abdelwahab Heba

Comments: 7 pages, 2 figures

Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2111.02813 (cross-list from cs.LG) [pdf, other]: Title: WaveFake: A Data Set to Facilitate Audio Deepfake Detection

Authors: Joel Frank, Lea Schönherr

Comments: Accepted to NeurIPS 2021 (Benchmark and Dataset Track); Code: this https URL; Data: this https URL

Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2111.03017 (cross-list from cs.SD) [pdf, other]: Title: MT3: Multi-Task Multitrack Music Transcription

Authors: Josh Gardner, Ian Simon, Ethan Manilow, Curtis Hawthorne, Jesse Engel

Comments: ICLR 2022 camera-ready version

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96] arXiv:2111.03146 (cross-list from cs.LG) [pdf, other]: Title: Generating Diverse Realistic Laughter for Interactive Art

Authors: M. Mehdi Afsar, Eric Park, Étienne Paquette, Gauthier Gidel, Kory W. Mathewson, Eilif Muller

Comments: Presented at Machine Learning for Creativity and Design workshop at NeurIPS 2021, 6 pages

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:2111.03250 (cross-list from cs.CL) [pdf, other]: Title: Context-Aware Transformer Transducer for Speech Recognition

Authors: Feng-Ju Chang, Jing Liu, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, Ariya Rastrow, Siegfried Kunzmann

Comments: Accepted to ASRU 2021

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2111.03333 (cross-list from cs.CL) [pdf, ps, other]: Title: Effective Cross-Utterance Language Modeling for Conversational Speech Recognition

Authors: Bi-Cheng Yan, Hsin-Wei Wang, Shih-Hsuan Chiu, Hsuan-Sheng Chiu, Berlin Chen

Comments: 6 pages, 6 figures, and 4 tables. Accepted by 2022 International Joint Conference on Neural Networks (IJCNN 2022)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2111.03442 (cross-list from cs.CL) [pdf, other]: Title: Conformer-based Hybrid ASR System for Switchboard Dataset

Authors: Mohammad Zeineldeen, Jingjing Xu, Christoph Lüscher, Wilfried Michel, Alexander Gerstenberger, Ralf Schlüter, Hermann Ney

Comments: Accepted at ICASSP 2022

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[100] arXiv:2111.03629 (cross-list from cs.SD) [src]: Title: Objective measurement of pitch extractors' responses to frequency modulated sounds and two reference pitch extraction methods for analyzing voice pitch responses to auditory stimulation

Authors: Hideki Kawahara, Kohei Yatabe, Ken-Ichi Sakakibara, Tatsuya Kitamura, Hideki Banno, Masanori Morise

Comments: ICASSP2022 rejected this. The substantially revised version was submitted to Interspeech2022 and accepted. It is arXiv:2204.00911

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[101] arXiv:2111.03664 (cross-list from cs.LG) [pdf, other]: Title: Oracle Teacher: Leveraging Target Information for Better Knowledge Distillation of CTC Models

Authors: Ji Won Yoon, Hyung Yong Kim, Hyeonseung Lee, Sunghwan Ahn, Nam Soo Kim

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

Subjects: Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[102] arXiv:2111.03777 (cross-list from cs.CL) [pdf, other]: Title: Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

Authors: Natalia Tomashenko, Salima Mdhaffar, Marc Tommasi, Yannick Estève, Jean-François Bonastre

Comments: Submitted to ICASSP 2022

Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6972-6976

Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2111.03811 (cross-list from cs.SD) [pdf, other]: Title: SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System for Both Human Beings and Machines

Authors: Haozhe Zhang, Zexin Cai, Xiaoyi Qin, Ming Li

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[104] arXiv:2111.03895 (cross-list from cs.SD) [pdf, ps, other]: Title: Digital Audio Processing Tools for Music Corpus Studies

Authors: Johanna Devaney

Comments: Preprint of book chapter: Devaney, J. (In Press). Audio processing tools for music corpus studies. In D. Shanahan, A. Burgoyne, & I. Quinn (Eds.), Oxford Handbook of Music and Corpus Studies. New York: Oxford University Press. The manuscript contains 6 figures and 3 tables

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105] arXiv:2111.03945 (cross-list from cs.CL) [pdf, other]: Title: Towards Building ASR Systems for the Next Billion Users

Authors: Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[106] arXiv:2111.03971 (cross-list from cs.SD) [pdf, ps, other]: Title: Towards noise robust trigger-word detection with contrastive learning pre-task for fast on-boarding of new trigger-words

Authors: Sivakumar Balasubramanian, Aditya Jajodia, Gowtham Srinivasan

Comments: submitted to ICMLA

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[107] arXiv:2111.04040 (cross-list from cs.SD) [pdf, other]: Title: Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech

Authors: Sung-Feng Huang, Chyi-Jiunn Lin, Da-Rong Liu, Yi-Chen Chen, Hung-yi Lee

Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1558-1571, 2022

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[108] arXiv:2111.04093 (cross-list from cs.SD) [pdf, other]: Title: Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer

Authors: Yi-Jen Shih, Shih-Lun Wu, Frank Zalkow, Meinard Müller, Yi-Hsuan Yang

Comments: to be published at IEEE Transactions on Multimedia

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[109] arXiv:2111.04194 (cross-list from cs.CL) [pdf, other]: Title: Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition

Authors: Salima Mdhaffar, Jean-François Bonastre, Marc Tommasi, Natalia Tomashenko, Yannick Estève

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[110] arXiv:2111.04330 (cross-list from cs.SD) [pdf, other]: Title: Characterizing the adversarial vulnerability of speech self-supervised learning

Authors: Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-yi Lee, Helen Meng

Comments: Accepted by ICASSP 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[111] arXiv:2111.04436 (cross-list from cs.SD) [pdf, other]: Title: SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points

Authors: Yu-Chen Lin, Cheng Yu, Yi-Te Hsu, Szu-Wei Fu, Yu Tsao, Tei-Wei Kuo

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[112] arXiv:2111.04823 (cross-list from cs.CL) [pdf, other]: Title: Cascaded Multilingual Audio-Visual Learning from Videos

Authors: Andrew Rouditchenko, Angie Boggust, David Harwath, Samuel Thomas, Hilde Kuehne, Brian Chen, Rameswar Panda, Rogerio Feris, Brian Kingsbury, Michael Picheny, James Glass

Comments: Presented at Interspeech 2021. This version contains updated results using the YouCook-Japanese dataset

Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[113] arXiv:2111.04988 (cross-list from cs.SD) [pdf, other]: Title: Ultra-Low Power Keyword Spotting at the Edge

Authors: Mehmet Gorkem Ulkar, Osman Erman Okman

Comments: 5 pages, 5 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[114] arXiv:2111.05011 (cross-list from cs.LG) [pdf, other]: Title: RAVE: A variational autoencoder for fast and high-quality neural audio synthesis

Authors: Antoine Caillon, Philippe Esling

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2111.05095 (cross-list from cs.SD) [pdf, other]: Title: Speaker Generation

Authors: Daisy Stanton, Matt Shannon, Soroosh Mariooryad, RJ Skerry-Ryan, Eric Battenberg, Tom Bagby, David Kao

Comments: 12 pages, 3 figures, 4 tables, appendix with 2 tables

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[116] arXiv:2111.05113 (cross-list from cs.CR) [pdf, other]: Title: Membership Inference Attacks Against Self-supervised Speech Models

Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

Comments: Accepted to Interspeech 2022. Code will be available in the future

Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117] arXiv:2111.05128 (cross-list from cs.LG) [pdf, other]: Title: Losses, Dissonances, and Distortions

Authors: Pablo Samuel Castro

Comments: In the 5th Machine Learning for Creativity and Design Workshop at NeurIPS 2021

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118] arXiv:2111.05174 (cross-list from cs.SD) [pdf, other]: Title: CAESynth: Real-Time Timbre Interpolation and Pitch Control with Conditional Autoencoders

Authors: Aaron Valero Puche, Sukhan Lee

Comments: MLSP 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[119] arXiv:2111.05222 (cross-list from cs.CV) [pdf, other]: Title: Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition

Authors: Gnana Praveen R, Eric Granger, Patrick Cardinal

Comments: Accepted in FG2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[120] arXiv:2111.05592 (cross-list from cs.SD) [pdf, other]: Title: Improving the Chamberlin Digital State Variable Filter

Authors: Victor Lazzarini, Joseph Timoney

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121] arXiv:2111.05846 (cross-list from cs.SD) [pdf, other]: Title: Structure from Silence: Learning Scene Structure from Ambient Sound

Authors: Ziyang Chen, Xixi Hu, Andrew Owens

Comments: Accepted to CoRL 2021 (Oral Presentation)

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[122] arXiv:2111.05890 (cross-list from cs.CV) [pdf, ps, other]: Title: Multimodal End-to-End Group Emotion Recognition using Cross-Modal Attention

Authors: Lev Evtodienko

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123] arXiv:2111.05895 (cross-list from cs.SD) [pdf, other]: Title: A Generic Deep Learning Based Cough Analysis System from Clinically Validated Samples for Point-of-Need Covid-19 Test and Severity Levels

Authors: Javier Andreu-Perez, Humberto Pérez-Espinosa, Eva Timonet, Mehrin Kiani, Manuel I. Girón-Pérez, Alma B. Benitez-Trinidad, Delaram Jarchi, Alejandro Rosales-Pérez, Nick Gatzoulis, Orion F. Reyes-Galaviz, Alejandro Torres-García, Carlos A. Reyes-García, Zulfiqar Ali, Francisco Rivas

Journal-ref: IEEE Transactions on Services Computing (2021)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124] arXiv:2111.05948 (cross-list from cs.CL) [pdf, other]: Title: Scaling ASR Improves Zero and Few Shot Learning

Authors: Alex Xiao, Weiyi Zheng, Gil Keren, Duc Le, Frank Zhang, Christian Fuegen, Ozlem Kalinli, Yatharth Saraf, Abdelrahman Mohamed

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2111.06046 (cross-list from cs.SD) [pdf, other]: Title: Music Score Expansion with Variable-Length Infilling

Authors: Chih-Pin Tan, Chin-Jui Chang, Alvin W.Y. Su, Yi-Hsuan Yang

Comments: Going to published as a late-breaking demo paper at ISMIR 2021

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[126] arXiv:2111.06310 (cross-list from cs.CL) [pdf, other]: Title: Self-Normalized Importance Sampling for Neural Language Modeling

Authors: Zijian Yang, Yingbo Gao, Alexander Gerstenberger, Jintao Jiang, Ralf Schlüter, Hermann Ney

Comments: Accepted at INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127] arXiv:2111.06316 (cross-list from cs.SD) [pdf, other]: Title: Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport

Authors: Hsin-Yi Lin, Huan-Hsin Tseng, Xugang Lu, Yu Tsao

Comments: Accepted at NeurIPS 2021

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[128] arXiv:2111.06331 (cross-list from cs.SD) [pdf, other]: Title: Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset

Authors: Aly Moustafa, Salah A. Aly

Comments: 5 pages, 9 figures, 2 tables

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[129] arXiv:2111.06531 (cross-list from cs.SD) [pdf, other]: Title: Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization

Authors: Byeonggeun Kim, Seunghan Yang, Jangho Kim, Simyung Chang

Comments: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[130] arXiv:2111.06643 (cross-list from cs.SD) [pdf, other]: Title: Fully Automatic Page Turning on Real Scores

Authors: Florian Henkel, Stephanie Schwaiger, Gerhard Widmer

Comments: ISMIR 2021 Late Breaking/Demo

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131] arXiv:2111.06799 (cross-list from cs.CL) [pdf, other]: Title: Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

Authors: Ondrej Klejch, Electra Wallington, Peter Bell

Comments: Submitted to Interspeech 2022

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[132] arXiv:2111.07094 (cross-list from cs.SD) [pdf, ps, other]: Title: Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme Learning Machine with a New Weighting Scheme and Spectro-Temporal Features Along with Classical Feature Selection and A New Quantum-Inspired Dimension Reduction Method

Authors: Fatemeh Daneshfar, Seyed Jahanshah Kabudian

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[133] arXiv:2111.07116 (cross-list from cs.SD) [pdf, other]: Title: Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion

Authors: Chao Xie, Yi-Chiao Wu, Patrick Lumban Tobing, Wen-Chin Huang, Tomoki Toda

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134] arXiv:2111.07234 (cross-list from cs.SD) [pdf, ps, other]: Title: Speech Emotion Recognition System by Quaternion Nonlinear Echo State Network

Authors: Fatemeh Daneshfar, Seyed Jahanshah Kabudian

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135] arXiv:2111.07402 (cross-list from cs.CL) [pdf, other]: Title: Textless Speech Emotion Conversion using Discrete and Decomposed Representations

Authors: Felix Kreuk, Adam Polyak, Jade Copet, Eugene Kharitonov, Tu-Anh Nguyen, Morgane Rivière, Wei-Ning Hsu, Abdelrahman Mohamed, Emmanuel Dupoux, Yossi Adi

Comments: Paper was published at EMNLP 2022

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136] arXiv:2111.07454 (cross-list from cs.CL) [pdf, other]: Title: Towards Interpretability of Speech Pause in Dementia Detection using Adversarial Learning

Authors: Youxiang Zhu, Bang Tran, Xiaohui Liang, John A. Batsis, Robert M. Roth

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137] arXiv:2111.07518 (cross-list from cs.SD) [pdf, other]: Title: Time-Frequency Attention for Monaural Speech Enhancement

Authors: Qiquan Zhang, Qi Song, Zhaoheng Ni, Aaron Nicolson, Haizhou Li

Comments: 5 pages, 4 figures, Accepted and presented at ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138] arXiv:2111.07549 (cross-list from cs.CL) [pdf, other]: Title: Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data

Authors: Zhu Li, Yuqing Zhang, Mengxi Nie, Ming Yan, Mengnan He, Ruixiong Zhang, Caixia Gong

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[139] arXiv:2111.07657 (cross-list from cs.SD) [pdf, ps, other]: Title: Symbolic Music Loop Generation with VQ-VAE

Authors: Sangjun Han, Hyeongrae Ihm, Woohyung Lim

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[140] arXiv:2111.07979 (cross-list from cs.SD) [pdf, other]: Title: Metric-based multimodal meta-learning for human movement identification via footstep recognition

Authors: Muhammad Shakeel, Katsutoshi Itoyama, Kenji Nishida, Kazuhiro Nakadai

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Systems and Control (eess.SY); Neurons and Cognition (q-bio.NC)
[141] arXiv:2111.08046 (cross-list from cs.CV) [pdf, other]: Title: Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention

Authors: Kranti Kumar Parida, Siddharth Srivastava, Gaurav Sharma

Comments: To appear in WACV 2022. arXiv admin note: text overlap with arXiv:2108.04906

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142] arXiv:2111.08137 (cross-list from cs.CL) [pdf, other]: Title: Joint Unsupervised and Supervised Training for Multilingual ASR

Authors: Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143] arXiv:2111.08191 (cross-list from cs.CL) [pdf, other]: Title: CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis

Authors: Nianzu Zheng, Liqun Deng, Wenyong Huang, Yu Ting Yeung, Baohua Xu, Yuanyuan Guo, Yasheng Wang, Xiao Chen, Xin Jiang, Qun Liu

Comments: 5 pages, 4 figures, Accepted by INTERSPEECH 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144] arXiv:2111.08196 (cross-list from cs.SD) [pdf, other]: Title: An Exploratory Study on Perceptual Spaces of the Singing Voice

Authors: Brendan O'Connor, Simon Dixon, George Fazekas

Comments: In Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020), Stockholm, Sweden, October 15-19, 2020

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145] arXiv:2111.08327 (cross-list from cs.SD) [pdf, other]: Title: Detecting acoustic reflectors using a robot's ego-noise

Authors: Usama Saqib (AAU), Antoine Deleforge (MULTISPEECH), Jesper Jensen (AAU)

Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2021, Toronto, Canada

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146] arXiv:2111.08380 (cross-list from cs.MM) [pdf, other]: Title: Video Background Music Generation with Controllable Music Transformer

Authors: Shangzhe Di, Zeren Jiang, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan

Comments: Accepted to ACM Multimedia 2021. Project website at this https URL

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147] arXiv:2111.08400 (cross-list from cs.CL) [pdf, other]: Title: Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition

Authors: Yi-Chang Chen, Chun-Yen Cheng, Chien-An Chen, Ming-Chieh Sung, Yi-Ren Yeh

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148] arXiv:2111.08839 (cross-list from cs.SD) [pdf, other]: Title: Zero-shot Singing Technique Conversion

Authors: Brendan O'Connor, Simon Dixon, George Fazekas

Comments: In Proceedings of the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR 2021), Tokyo, Japan, November 15-16, 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149] arXiv:2111.08910 (cross-list from cs.SD) [pdf, other]: Title: Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Authors: Hengshun Zhou, Jun Du, Yuanyuan Zhang, Qing Wang, Qing-Feng Liu, Chin-Hui Lee

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[150] arXiv:2111.09014 (cross-list from cs.SD) [pdf, ps, other]: Title: Subject Enveloped Deep Sample Fuzzy Ensemble Learning Algorithm of Parkinson's Speech Data

Authors: Yiwen Wang, Fan Li, Xiaoheng Zhang, Pin Wang, Yongming Li

Comments: 18 pages, 4 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[151] arXiv:2111.09052 (cross-list from cs.SD) [pdf, other]: Title: High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency

Authors: Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Aimilios Chalamandaris, Georgia Maniati, Panos Kakoulidis, Spyros Raptis, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis

Comments: Proceedings of INTERSPEECH 2020

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[152] arXiv:2111.09075 (cross-list from cs.SD) [pdf, ps, other]: Title: Cross-lingual Low Resource Speaker Adaptation Using Phonological Features

Authors: Georgia Maniati, Nikolaos Ellinas, Konstantinos Markopoulos, Georgios Vamvoukakis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis

Comments: Proceedings of INTERSPEECH 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153] arXiv:2111.09146 (cross-list from cs.SD) [pdf, other]: Title: Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

Authors: Konstantinos Markopoulos, Nikolaos Ellinas, Alexandra Vioni, Myrsini Christidou, Panos Kakoulidis, Georgios Vamvoukakis, Georgia Maniati, June Sig Sung, Hyoungmin Park, Pirros Tsiakoulis, Aimilios Chalamandaris

Comments: Proceedings of 11th ISCA Speech Synthesis Workshop (SSW 11)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[154] arXiv:2111.09296 (cross-list from cs.CL) [pdf, other]: Title: XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

Authors: Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155] arXiv:2111.09642 (cross-list from cs.SD) [pdf, other]: Title: Towards Intelligibility-Oriented Audio-Visual Speech Enhancement

Authors: Tassadaq Hussain, Mandar Gogate, Kia Dashtipour, Amir Hussain

Comments: 6 pages, 4 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[156] arXiv:2111.09771 (cross-list from cs.MM) [pdf, other]: Title: Transformer-S2A: Robust and Efficient Speech-to-Animation

Authors: Liyang Chen, Zhiyong Wu, Jun Ling, Runnan Li, Xu Tan, Sheng Zhao

Comments: Accepted by ICASSP 2022

Subjects: Multimedia (cs.MM); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157] arXiv:2111.09931 (cross-list from cs.SD) [pdf, other]: Title: DawDreamer: Bridging the Gap Between Digital Audio Workstations and Python Interfaces

Authors: David Braun

Comments: 3 pages with 0 figures. Included in the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[158] arXiv:2111.10003 (cross-list from cs.SD) [pdf, other]: Title: Differentiable Wavetable Synthesis

Authors: Siyuan Shan, Lamtharn Hantrakul, Jitong Chen, Matt Avent, David Trevelyan

Comments: Accepted by ICASSP 2022, Demo: this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159] arXiv:2111.10157 (cross-list from cs.CL) [pdf, other]: Title: Lattention: Lattice-attention in ASR rescoring

Authors: Prabhat Pandey, Sergio Duarte Torres, Ali Orkan Bayer, Ankur Gandhe, Volker Leutnant

Comments: Submitted to ICASSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160] arXiv:2111.10168 (cross-list from cs.SD) [pdf, other]: Title: Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control

Authors: Myrsini Christidou, Alexandra Vioni, Nikolaos Ellinas, Georgios Vamvoukakis, Konstantinos Markopoulos, Panos Kakoulidis, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis

Comments: Proceedings of SPECOM 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[161] arXiv:2111.10173 (cross-list from cs.SD) [pdf, other]: Title: Word-Level Style Control for Expressive, Non-attentive Speech Synthesis

Authors: Konstantinos Klapsas, Nikolaos Ellinas, June Sig Sung, Hyoungmin Park, Spyros Raptis

Comments: Proceedings of SPECOM 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[162] arXiv:2111.10177 (cross-list from cs.SD) [pdf, other]: Title: Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis

Authors: Alexandra Vioni, Myrsini Christidou, Nikolaos Ellinas, Georgios Vamvoukakis, Panos Kakoulidis, Taehoon Kim, June Sig Sung, Hyoungmin Park, Aimilios Chalamandaris, Pirros Tsiakoulis

Comments: Proceedings of ICASSP 2021

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[163] arXiv:2111.10235 (cross-list from cs.SD) [pdf, other]: Title: Interpreting deep urban sound classification using Layer-wise Relevance Propagation

Authors: Marco Colussi, Stavros Ntalampiras

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[164] arXiv:2111.10367 (cross-list from cs.CL) [pdf, other]: Title: SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

Authors: Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han

Comments: Updated preprint for SLUE Benchmark v0.2; Toolkit link this https URL

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165] arXiv:2111.10592 (cross-list from cs.SD) [pdf, other]: Title: Deep Spoken Keyword Spotting: An Overview

Authors: Iván López-Espejo, Zheng-Hua Tan, John Hansen, Jesper Jensen

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[166] arXiv:2111.10639 (cross-list from cs.SD) [pdf, other]: Title: Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection

Authors: Samuele Cornell, Thomas Balestri, Thibaud Sénéchal

Comments: To be presented at SLT 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167] arXiv:2111.10783 (cross-list from cs.SD) [pdf, ps, other]: Title: Automatic Detection of Depression from Stratified Samples of Audio Data

Authors: Pongpak Manoret, Punnatorn Chotipurk, Sompoom Sunpaweravong, Chanati Jantrachotechatchawan, Kobchai Duangrattanalert

Comments: 30 pages, 6 figures

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[168] arXiv:2111.10882 (cross-list from cs.CV) [pdf, other]: Title: Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video

Authors: Rishabh Garg, Ruohan Gao, Kristen Grauman

Comments: Published in BMVC 2021, project page: this http URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169] arXiv:2111.10897 (cross-list from cs.SD) [pdf, other]: Title: Health Monitoring of Industrial machines using Scene-Aware Threshold Selection

Authors: Arshdeep Singh, Raju Arvind, Padmanabhan Rajan

Comments: 5 pages, 4 figures, 1 Table

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[170] arXiv:2111.11023 (cross-list from cs.SD) [pdf, ps, other]: Title: Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

Authors: Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[171] arXiv:2111.11063 (cross-list from cs.SD) [pdf, other]: Title: Comparing the Accuracy of Deep Neural Networks (DNN) and Convolutional Neural Network (CNN) in Music Genre Recognition (MGR): Experiments on Kurdish Music

Authors: Aza Zuhair, Hossein Hassani

Comments: 8 pages, 5 figures, 3 tables

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[172] arXiv:2111.11636 (cross-list from cs.SD) [pdf, ps, other]: Title: Music Classification: Beyond Supervised Learning, Towards Real-world Applications

Authors: Minz Won, Janne Spijkervet, Keunwoo Choi

Comments: This is a web book written for a tutorial session of the 22nd International Society for Music Information Retrieval Conference, Nov 8-12, 2021. Please visit this https URL for the original, web book format

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[173] arXiv:2111.11703 (cross-list from cs.LG) [pdf, other]: Title: A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence

Authors: Taketo Akama

Comments: 22nd International Society for Music Information Retrieval Conference (ISMIR), 2021; 8 pages

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[174] arXiv:2111.11737 (cross-list from cs.SD) [pdf, ps, other]: Title: ADTOF: A large dataset of non-synthetic music for automatic drum transcription

Authors: Mickael Zehren, Marco Alunno, Paolo Bientinesi

Comments: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, Online, pp. 818-824

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175] arXiv:2111.11755 (cross-list from cs.SD) [pdf, other]: Title: Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance

Authors: Heeseung Kim, Sungwon Kim, Sungroh Yoon

Comments: 15 pages, 5 figures, ICML'2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[176] arXiv:2111.11773 (cross-list from cs.SD) [pdf, other]: Title: Upsampling layers for music source separation

Authors: Jordi Pons, Joan Serrà, Santiago Pascual, Giulio Cengarle, Daniel Arteaga, Davide Scaini

Comments: Demo page: this http URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[177] arXiv:2111.11859 (cross-list from cs.SD) [pdf, ps, other]: Title: Longitudinal Speech Biomarkers for Automated Alzheimer's Detection

Authors: Jordi Laguarta Soler, Brian Subirana

Journal-ref: Frontiers in Computer Science, 08 April 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[178] arXiv:2111.12028 (cross-list from cs.CL) [pdf, ps, other]: Title: Romanian Speech Recognition Experiments from the ROBIN Project

Authors: Andrei-Marius Avram, Vasile Păiş, Dan Tufiş

Comments: 12 pages, 3 figures, ConsILR2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179] arXiv:2111.12124 (cross-list from cs.SD) [pdf, ps, other]: Title: Towards Learning Universal Audio Representations

Authors: Luyu Wang, Pauline Luc, Yan Wu, Adria Recasens, Lucas Smaira, Andrew Brock, Andrew Jaegle, Jean-Baptiste Alayrac, Sander Dieleman, Joao Carreira, Aaron van den Oord

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[180] arXiv:2111.12324 (cross-list from cs.SD) [pdf, other]: Title: How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition

Authors: Haoran Sun, Lantian Li, Thomas Fang Zheng, Dong Wang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181] arXiv:2111.12326 (cross-list from cs.SD) [pdf, other]: Title: A Study on Decoupled Probabilistic Linear Discriminant Analysis

Authors: Di Wang, Lantian Li, Hongzhi Yu, Dong Wang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[182] arXiv:2111.12331 (cross-list from cs.SD) [pdf, other]: Title: An MAP Estimation for Between-Class Variance

Authors: Jiao Han, Yunqi Cai, Lantian Li, Guanyu Li, Dong Wang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[183] arXiv:2111.12531 (cross-list from cs.SD) [pdf, ps, other]: Title: Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations

Authors: Alex F. McKinney, Benjamin Cauchi

Comments: 4 pages + 1 refs; 1 figure; accepted at IEEE SPL (to appear)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[184] arXiv:2111.12566 (cross-list from q-bio.QM) [pdf, other]: Title: Acoustical Analysis of Speech Under Physical Stress in Relation to Physical Activities and Physical Literacy

Authors: Si-Ioi Ng, Rui-Si Ma, Tan Lee, Raymond Kim-Wai Sum

Comments: Accepted to Speech Prosody 2022

Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185] arXiv:2111.12588 (cross-list from cs.SD) [pdf, other]: Title: Towards Cross-Cultural Analysis using Music Information Dynamics

Authors: Shlomo Dubnov, Kevin Huang, Cheng-i Wang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[186] arXiv:2111.12761 (cross-list from cs.SD) [pdf, other]: Title: Semi-Supervised Audio Classification with Partially Labeled Data

Authors: Siddharth Gururani, Alexander Lerch

Comments: To be presented at IEEE ISM 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187] arXiv:2111.12869 (cross-list from cs.SD) [pdf, other]: Title: Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation

Authors: Wangkai Jin, Junyu Liu, Jianfeng Ren, Xiangjun Peng

Comments: Under reviewed in ICASSP 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188] arXiv:2111.12884 (cross-list from physics.ins-det) [src]: Title: A novel time delay estimation algorithm of acoustic pyrometry for furnace

Authors: Qi Liu, Bin Zhou, Jianyong Zhang, Ruixue Cheng

Comments: Under revision

Subjects: Instrumentation and Detectors (physics.ins-det); Sound (cs.SD); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[189] arXiv:2111.12890 (cross-list from cs.CV) [pdf, other]: Title: V2C: Visual Voice Cloning

Authors: Qi Chen, Yuanqing Li, Yuankai Qi, Jiaqiu Zhou, Mingkui Tan, Qi Wu

Comments: 15 pages, 14 figures

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[190] arXiv:2111.12986 (cross-list from cs.SD) [pdf, other]: Title: A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody

Authors: Or Goren, Eliya Nachmani, Lior Wolf

Comments: Accepted for publication at MMM 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[191] arXiv:2111.13457 (cross-list from cs.SD) [pdf, other]: Title: Semi-Supervised Music Tagging Transformer

Authors: Minz Won, Keunwoo Choi, Xavier Serra

Comments: International Society for Music Information Retrieval (ISMIR) 2021

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192] arXiv:2111.13486 (cross-list from cs.CY) [pdf, other]: Title: When Creators Meet the Metaverse: A Survey on Computational Arts

Authors: Lik-Hang Lee, Zijun Lin, Rui Hu, Zhengya Gong, Abhishek Kumar, Tangyao Li, Sijia Li, Pan Hui

Comments: Submitted to ACM Computing Surveys, 36 pages

Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193] arXiv:2111.13694 (cross-list from cs.SD) [pdf, other]: Title: Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

Authors: Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei

Comments: Submitted to ICASSP 2022, 5 pages, 2 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[194] arXiv:2111.14203 (cross-list from cs.SD) [pdf, ps, other]: Title: How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey

Authors: Zahra Khanjani, Gabrielle Watson, Vandana P. Janeja

Comments: Abbreviated version of a longer survey under review

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[195] arXiv:2111.14354 (cross-list from cs.SD) [pdf, ps, other]: Title: Responding to Challenge Call of Machine Learning Model Development in Diagnosing Respiratory Disease Sounds

Authors: Negin Melek

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[196] arXiv:2111.14448 (cross-list from cs.CV) [pdf, other]: Title: AVA-AVD: Audio-Visual Speaker Diarization in the Wild

Authors: Eric Zhongcong Xu, Zeyang Song, Satoshi Tsutsui, Chao Feng, Mang Ye, Mike Zheng Shou

Comments: ACMMM 2022

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[197] arXiv:2111.14479 (cross-list from cs.SD) [pdf, other]: Title: Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition

Authors: Junhao Xu, Jianwei Yu, Xunying Liu, Helen Meng

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[198] arXiv:2111.14706 (cross-list from cs.CL) [pdf, other]: Title: ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

Authors: Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

Comments: Accepted at ICASSP 2022 (5 pages)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[199] arXiv:2111.14843 (cross-list from cs.SD) [pdf, other]: Title: Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds

Authors: Abdelrahman Younes, Daniel Honerkamp, Tim Welschehold, Abhinav Valada

Comments: This paper has been accepted for publication at IEEE ROBOTICS AND AUTOMATION LETTERS

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[200] arXiv:2111.14951 (cross-list from cs.HC) [pdf, other]: Title: Expressive Communication: A Common Framework for Evaluating Developments in Generative Models and Steering Interfaces

Authors: Ryan Louie, Jesse Engel, Anna Huang

Comments: 15 pages, 6 figures, submitted to ACM Intelligent User Interfaces 2022 Conference

Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[201] arXiv:2111.15016 (cross-list from cs.CL) [pdf, other]: Title: Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

Authors: Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[202] arXiv:2111.15156 (cross-list from cs.CL) [pdf, other]: Title: Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Authors: Pakhi Bamdev, Manraj Singh Grover, Yaman Kumar Singla, Payman Vafaee, Mika Hama, Rajiv Ratn Shah

Comments: Accepted for publication in the International Journal of Artificial Intelligence in Education (IJAIED)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203] arXiv:2111.15159 (cross-list from cs.SD) [pdf, other]: Title: CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer

Authors: Changzeng Fu, Chaoran Liu, Carlos Toshinori Ishi, Hiroshi Ishiguro

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[204] arXiv:2111.15222 (cross-list from cs.SD) [pdf, other]: Title: SP-SEDT: Self-supervised Pre-training for Sound Event Detection Transformer

Authors: Zhirong Ye, Xiangdong Wang, Hong Liu, Yueliang Qian, Rui Tao, Long Yan, Kazushige Ouchi

Comments: Submitted to interspeech 2022; added experiments for section 4

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 204 entries: 1-204 ]
[ showing 204 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2404, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Nov 2021