Audio and Speech Processing

Authors and titles for recent submissions, skipping first 42

Thu, 18 Apr 2024
Wed, 17 Apr 2024
Tue, 16 Apr 2024
Mon, 15 Apr 2024
Fri, 12 Apr 2024

[ total of 35 entries: 1-25 | 11-35 ]
[ showing 25 entries per page: fewer | more | all ]

Wed, 17 Apr 2024 (continued, showing last 5 of 9 entries)

[11] arXiv:2404.10316 (cross-list from cs.SD) [pdf, ps, other]: Title: Multiple Mobile Target Detection and Tracking in Active Sonar Array Using a Track-Before-Detect Approach

Authors: Avi Abu, Nikola Miskovic, Oleg Chebotar, Neven Cukrov, Roee Diamant

Comments: 10 pages, 10 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[12] arXiv:2404.10301 (cross-list from cs.SD) [pdf, other]: Title: Long-form music generation with latent diffusion

Authors: Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2404.10299 (cross-list from cs.LG) [pdf, other]: Title: Clustering and Data Augmentation to Improve Accuracy of Sleep Assessment and Sleep Individuality Analysis

Authors: Shintaro Tamai, Masayuki Numao, Ken-ichi Fukui

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2404.10180 (cross-list from cs.CL) [pdf, other]: Title: Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR

Authors: Zelin Wu, Gan Song, Christopher Li, Pat Rondon, Zhong Meng, Xavier Velez, Weiran Wang, Diamantino Caseiro, Golan Pundak, Tsendsuren Munkhdalai, Angad Chandorkar, Rohit Prabhavalkar

Comments: 9 pages, 3 figures, accepted by NAACL 2024 - Industry Track

Journal-ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Industry Track

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[15] arXiv:2404.10112 (cross-list from cs.CL) [pdf, other]: Title: PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish

Authors: Zofia Malisz, Jan Foremski, Małgorzata Kul

Comments: To appear in the proceedings of LREC2024: Language Resources and Evaluation Conference 2024, Turin, Italy

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 16 Apr 2024

[16] arXiv:2404.09841 [pdf, other]: Title: Anatomy of Industrial Scale Multilingual ASR

Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[17] arXiv:2404.09385 [pdf, other]: Title: A Large-Scale Evaluation of Speech Foundation Models

Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

Comments: The extended journal version for SUPERB and SUPERB-SG. Accepted to TASLP. The arxiv version is further refined

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Signal Processing (eess.SP)
[18] arXiv:2404.09313 [pdf, other]: Title: Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment

Authors: Zhiqing Hong, Rongjie Huang, Xize Cheng, Yongqi Wang, Ruiqi Li, Fuming You, Zhou Zhao, Zhimeng Zhang

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[19] arXiv:2404.09956 (cross-list from cs.SD) [pdf, other]: Title: Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

Comments: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20] arXiv:2404.09466 (cross-list from cs.SD) [pdf, other]: Title: Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

Authors: Yujia Yan, Zhiyao Duan

Comments: Fixed Typos

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2404.09342 (cross-list from cs.CV) [pdf, other]: Title: Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

Comments: ACM Multimedia Conference - Grand Challenge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2404.09192 (cross-list from cs.SD) [pdf, other]: Title: Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23] arXiv:2404.09177 (cross-list from cs.SD) [pdf, other]: Title: An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging

Authors: Gabriel Meseguer-Brocal, Dorian Desblancs, Romain Hennequin

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.08857 (cross-list from cs.SD) [pdf, other]: Title: Voice Attribute Editing with Text Prompt

Authors: Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[25] arXiv:2404.08813 (cross-list from cs.HC) [pdf, other]: Title: Interactive Sonification for Health and Energy using ChucK and Unity

Authors: Yichun Zhao, George Tzanetakis

Comments: In the Proceedings of the Conference on Sonification of Health and Environmental Data (SoniHED 2022). this http URL

Journal-ref: Conference on Sonification of Health and Environmental Data (SoniHED 2022)

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 15 Apr 2024

[26] arXiv:2404.08064 [pdf, ps, other]: Title: The Impact of Speech Anonymization on Pathology and Its Limits

Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
[27] arXiv:2404.08264 (cross-list from cs.MM) [pdf, other]: Title: Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis

Authors: Masahiro Yasuda, Noboru Harada, Yasunori Ohishi, Shoichiro Saito, Akira Nakayama, Nobutaka Ono

Comments: 13page, 7figure, under review

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[28] arXiv:2404.08022 (cross-list from cs.SD) [pdf, other]: Title: A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

Authors: Thomas Serre (S2A, IDS), Mathieu Fontaine (S2A, IDS), Éric Benhaim, Geoffroy Dutour, Slim Essid (S2A, IDS)

Comments: Accepted at HSCMA24, Satellite workshop of ICASSP24

Journal-ref: ICASSP, Apr 2024, Seoul (Korea), South Korea

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 12 Apr 2024

[29] arXiv:2404.07970 [pdf, other]: Title: Differentiable All-pole Filters for Time-varying Audio Systems

Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

Comments: Submitted to DAFx 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[30] arXiv:2404.07341 [pdf, other]: Title: Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

Authors: Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[31] arXiv:2404.07226 [pdf, other]: Title: Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models

Authors: Alkis Koudounas, Flavio Giobergia

Comments: 2 pages

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32] arXiv:2404.07989 (cross-list from cs.CV) [pdf, other]: Title: Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

Authors: Yiwen Tang, Jiaming Liu, Dong Wang, Zhigang Wang, Shanghang Zhang, Bin Zhao, Xuelong Li

Comments: Code and models are released at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2404.07616 (cross-list from cs.CL) [pdf, other]: Title: Audio Dialogues: Dialogues dataset for audio and music understanding

Authors: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Comments: Demo website: this https URL

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34] arXiv:2404.07575 (cross-list from cs.SD) [pdf, ps, other]: Title: An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

Comments: Accepted to NAACL 2024 Findings

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[35] arXiv:2404.07336 (cross-list from cs.CV) [pdf, other]: Title: PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

Authors: Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu J. Han

Comments: 24 pages

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Thu, 18 Apr 2024
Wed, 17 Apr 2024
Tue, 16 Apr 2024
Mon, 15 Apr 2024
Fri, 12 Apr 2024

[ total of 35 entries: 1-25 | 11-35 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2404, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for recent submissions, skipping first 42

Wed, 17 Apr 2024 (continued, showing last 5 of 9 entries)

Tue, 16 Apr 2024

Mon, 15 Apr 2024

Fri, 12 Apr 2024