Audio and Speech Processing

Authors and titles for recent submissions

Fri, 19 Apr 2024
Thu, 18 Apr 2024
Wed, 17 Apr 2024
Tue, 16 Apr 2024
Mon, 15 Apr 2024

[ total of 37 entries: 1-31 | 32-37 ]
[ showing 31 entries per page: fewer | more | all ]

Fri, 19 Apr 2024

[1] arXiv:2404.11621 [pdf, ps, other]: Title: Efficient High-Performance Bark-Scale Neural Network for Residual Echo and Noise Suppression

Authors: Ernst Seidel, Pejman Mowlaee, Tim Fingscheidt

Comments: accepted to ICASSP 2024; 5 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS)
[2] arXiv:2404.11619 [pdf, ps, other]: Title: Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech

Authors: Shannon Wotherspoon, William Hartmann, Matthew Snover

Comments: 2 pages

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[3] arXiv:2404.12299 (cross-list from cs.CL) [pdf, other]: Title: Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair

Authors: Yusuke Sakai, Mana Makinae, Hidetaka Kamigaito, Taro Watanabe

Comments: 23 pages, 9 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2404.12251 (cross-list from cs.LG) [pdf, other]: Title: Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities

Authors: Luciana Trinkaus Menon, Luiz Carlos Ribeiro Neduziak, Jean Paul Barddal, Alessandro Lameiras Koerich, Alceu de Souza Britto Jr

Comments: 15 pages

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5] arXiv:2404.12132 (cross-list from cs.SD) [pdf, other]: Title: Enhancing Suicide Risk Assessment: A Speech-Based Automated Approach in Emergency Medicine

Authors: Shahin Amiriparian, Maurice Gerczuk, Justina Lutz, Wolfgang Strube, Irina Papazova, Alkomiet Hasan, Alexander Kathan, Björn W. Schuller

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[6] arXiv:2404.12077 (cross-list from cs.SD) [pdf, other]: Title: TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches

Authors: Rong Wang, Kun Sun

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7] arXiv:2404.12062 (cross-list from cs.SD) [pdf, other]: Title: MIDGET: Music Conditioned 3D Dance Generation

Authors: Jinwu Wang, Wei Mao, Miaomiao Liu

Comments: 12 pages, 6 figures Published in AI 2023: Advances in Artificial Intelligence

Journal-ref: In Australasian Joint Conference on Artificial Intelligence (pp. 277-288). Singapore: Springer Nature Singapore 2023

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
[8] arXiv:2404.11976 (cross-list from cs.SD) [pdf, other]: Title: Large Language Models: From Notes to Musical Form

Authors: Lilac Atassi

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9] arXiv:2404.11938 (cross-list from cs.MM) [pdf, other]: Title: HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis

Authors: Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, Liang Hu

Comments: 13 pages, IJCAI-2024

Subjects: Multimedia (cs.MM); Distributed, Parallel, and Cluster Computing (cs.DC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 18 Apr 2024

[10] arXiv:2404.11399 [pdf, other]: Title: In situ sound absorption estimation with the discrete complex image source method

Authors: Eric Brandao, William Fonseca, Paulo Mareze, Carlos Resende, Gabriel Azzuz, Joao Pontalti, Efren Fernandez-Grande

Comments: 37 pages, 12 figures, original manuscript to be submitted to the Journal of Sound and Vibration

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Classical Physics (physics.class-ph)
[11] arXiv:2404.11275 (cross-list from cs.SD) [pdf, other]: Title: Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation

Authors: Ye Bai, Chenxing Li, Hao Li, Yuanyuan Zhao, Xiaorui Wang

Comments: Accepted by ICME 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2404.11116 (cross-list from cs.SD) [pdf, other]: Title: Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

Authors: Keren Shao, Ke Chen, Shlomo Dubnov

Comments: 2 pages, 2 figures, 1 tables, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13] arXiv:2404.10989 (cross-list from cs.CV) [pdf, other]: Title: FairSSD: Understanding Bias in Synthetic Speech Detectors

Authors: Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J.Delp

Comments: Accepted at CVPR 2024 (WMF)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2404.10922 (cross-list from cs.CL) [pdf, other]: Title: Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

Authors: Pavel Denisov, Ngoc Thang Vu

Comments: NAACL Findings 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15] arXiv:2404.10842 (cross-list from cs.SD) [pdf, ps, other]: Title: Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

Authors: Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas

Comments: 11 pages, 7 figures, 1 table

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Wed, 17 Apr 2024

[16] arXiv:2404.10419 [pdf, other]: Title: MAD Speech: Measures of Acoustic Diversity of Speech

Authors: Matthieu Futeral, Andrea Agostinelli, Marco Tagliasacchi, Neil Zeghidour, Eugene Kharitonov

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[17] arXiv:2404.10310 [pdf, other]: Title: Wireless Earphone-based Real-Time Monitoring of Breathing Exercises: A Deep Learning Approach

Authors: Hassam Khan Wazir, Zaid Waghoo, Vikram Kapila

Comments: 4 pages, 2 figures. Paper accepted at IEEE International Conference on Engineering in Medicine & Biology Society, 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[18] arXiv:2404.10578 (cross-list from cs.SD) [pdf, other]: Title: Vivo : une approche multimodale de la synthese concatenative par corpus dans le cadre d'une oeuvre audiovisuelle immersive

Authors: Mateo Fayet

Comments: in French language

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2404.10440 (cross-list from cs.CL) [pdf, other]: Title: Language Proficiency and F0 Entrainment: A Study of L2 English Imitation in Italian, French, and Slovak Speakers

Authors: Zheng Yuan, Štefan Beňuš, Alessandro D'Ausilio

Comments: Accepted at Speech Prosody 2024

Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20] arXiv:2404.10316 (cross-list from cs.SD) [pdf, ps, other]: Title: Multiple Mobile Target Detection and Tracking in Active Sonar Array Using a Track-Before-Detect Approach

Authors: Avi Abu, Nikola Miskovic, Oleg Chebotar, Neven Cukrov, Roee Diamant

Comments: 10 pages, 10 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[21] arXiv:2404.10301 (cross-list from cs.SD) [pdf, other]: Title: Long-form music generation with latent diffusion

Authors: Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22] arXiv:2404.10299 (cross-list from cs.LG) [pdf, other]: Title: Clustering and Data Augmentation to Improve Accuracy of Sleep Assessment and Sleep Individuality Analysis

Authors: Shintaro Tamai, Masayuki Numao, Ken-ichi Fukui

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2404.10180 (cross-list from cs.CL) [pdf, other]: Title: Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Authors: Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

Comments: 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track

Journal-ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.10112 (cross-list from cs.CL) [pdf, other]: Title: PRODIS -- a speech database and a phoneme-based language model for the study of predictability effects in Polish

Authors: Zofia Malisz, Jan Foremski, Małgorzata Kul

Comments: To appear in the proceedings of LREC2024: Language Resources and Evaluation Conference 2024, Turin, Italy

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 16 Apr 2024 (showing first 7 of 10 entries)

[25] arXiv:2404.09841 [pdf, other]: Title: Anatomy of Industrial Scale Multilingual ASR

Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[26] arXiv:2404.09385 [pdf, other]: Title: A Large-Scale Evaluation of Speech Foundation Models

Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

Comments: The extended journal version for SUPERB and SUPERB-SG. Accepted to TASLP. The arxiv version is further refined

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Signal Processing (eess.SP)
[27] arXiv:2404.09313 [pdf, other]: Title: Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[28] arXiv:2404.09956 (cross-list from cs.SD) [pdf, other]: Title: Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

Comments: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[29] arXiv:2404.09466 (cross-list from cs.SD) [pdf, other]: Title: Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

Authors: Yujia Yan, Zhiyao Duan

Comments: Fixed Typos

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30] arXiv:2404.09342 (cross-list from cs.CV) [pdf, other]: Title: Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

Comments: ACM Multimedia Conference - Grand Challenge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:2404.09192 (cross-list from cs.SD) [pdf, other]: Title: Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)

Fri, 19 Apr 2024
Thu, 18 Apr 2024
Wed, 17 Apr 2024
Tue, 16 Apr 2024
Mon, 15 Apr 2024

[ total of 37 entries: 1-31 | 32-37 ]
[ showing 31 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2404, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions

Fri, 19 Apr 2024

Thu, 18 Apr 2024

Wed, 17 Apr 2024

Tue, 16 Apr 2024 (showing first 7 of 10 entries)