Sound

Authors and titles for recent submissions

Thu, 18 Apr 2024
Wed, 17 Apr 2024
Tue, 16 Apr 2024
Mon, 15 Apr 2024
Fri, 12 Apr 2024

[ total of 26 entries: 1-26 ]
[ showing up to 38 entries per page: fewer | more ]

Thu, 18 Apr 2024

[1] arXiv:2404.11275 [pdf, other]: Title: Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation

Authors: Ye Bai, Chenxing Li, Hao Li, Yuanyuan Zhao, Xiaorui Wang

Comments: Accepted by ICME 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2404.11116 [pdf, other]: Title: Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

Authors: Keren Shao, Ke Chen, Shlomo Dubnov

Comments: 2 pages, 2 figures, 1 tables, Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[3] arXiv:2404.10842 [pdf, ps, other]: Title: Unsupervised Speaker Diarization in Distributed IoT Networks Using Federated Learning

Authors: Amit Kumar Bhuyan, Hrishikesh Dutta, Subir Biswas

Comments: 11 pages, 7 figures, 1 table

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2404.11399 (cross-list from eess.AS) [pdf, other]: Title: In situ sound absorption estimation with the discrete complex image source method

Authors: Eric Brandao, William Fonseca, Paulo Mareze, Carlos Resende, Gabriel Azzuz, Joao Pontalti, Efren Fernandez-Grande

Comments: 37 pages, 12 figures, original manuscript to be submitted to the Journal of Sound and Vibration

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Classical Physics (physics.class-ph)
[5] arXiv:2404.10989 (cross-list from cs.CV) [pdf, other]: Title: FairSSD: Understanding Bias in Synthetic Speech Detectors

Authors: Amit Kumar Singh Yadav, Kratika Bhagtani, Davide Salvi, Paolo Bestagini, Edward J.Delp

Comments: Accepted at CVPR 2024 (WMF)

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6] arXiv:2404.10922 (cross-list from cs.CL) [pdf, other]: Title: Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

Authors: Pavel Denisov, Ngoc Thang Vu

Comments: NAACL Findings 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 17 Apr 2024

[7] arXiv:2404.10578 [pdf, other]: Title: Vivo : une approche multimodale de la synthese concatenative par corpus dans le cadre d'une oeuvre audiovisuelle immersive

Authors: Mateo Fayet

Comments: in French language

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2404.10316 [pdf, ps, other]: Title: Multiple Mobile Target Detection and Tracking in Active Sonar Array Using a Track-Before-Detect Approach

Authors: Avi Abu, Nikola Miskovic, Oleg Chebotar, Neven Cukrov, Roee Diamant

Comments: 10 pages, 10 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[9] arXiv:2404.10301 [pdf, other]: Title: Long-form music generation with latent diffusion

Authors: Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:2404.10299 (cross-list from cs.LG) [pdf, other]: Title: Clustering and Data Augmentation to Improve Accuracy of Sleep Assessment and Sleep Individuality Analysis

Authors: Shintaro Tamai, Masayuki Numao, Ken-ichi Fukui

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2404.10112 (cross-list from cs.CL) [pdf, other]: Title: PRODIS - a speech database and a phoneme-based language model for the study of predictability effects in Polish

Authors: Zofia Malisz, Jan Foremski, Małgorzata Kul

Comments: To appear in the proceedings of LREC2024: Language Resources and Evaluation Conference 2024, Turin, Italy

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 16 Apr 2024

[12] arXiv:2404.09956 [pdf, other]: Title: Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

Comments: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13] arXiv:2404.09466 [pdf, other]: Title: Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

Authors: Yujia Yan, Zhiyao Duan

Comments: Fixed Typos

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14] arXiv:2404.09192 [pdf, other]: Title: Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[15] arXiv:2404.09177 [pdf, other]: Title: An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging

Authors: Gabriel Meseguer-Brocal, Dorian Desblancs, Romain Hennequin

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16] arXiv:2404.08857 [pdf, other]: Title: Voice Attribute Editing with Text Prompt

Authors: Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[17] arXiv:2404.09841 (cross-list from eess.AS) [pdf, other]: Title: Anatomy of Industrial Scale Multilingual ASR

Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[18] arXiv:2404.09342 (cross-list from cs.CV) [pdf, other]: Title: Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

Comments: ACM Multimedia Conference - Grand Challenge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2404.08813 (cross-list from cs.HC) [pdf, other]: Title: Interactive Sonification for Health and Energy using ChucK and Unity

Authors: Yichun Zhao, George Tzanetakis

Comments: In the Proceedings of the Conference on Sonification of Health and Environmental Data (SoniHED 2022). this http URL

Journal-ref: Conference on Sonification of Health and Environmental Data (SoniHED 2022)

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 15 Apr 2024

[20] arXiv:2404.08022 [pdf, other]: Title: A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

Authors: Thomas Serre (S2A, IDS), Mathieu Fontaine (S2A, IDS), Éric Benhaim, Geoffroy Dutour, Slim Essid (S2A, IDS)

Comments: Accepted at HSCMA24, Satellite workshop of ICASSP24

Journal-ref: ICASSP, Apr 2024, Seoul (Korea), South Korea

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 12 Apr 2024

[21] arXiv:2404.07575 [pdf, ps, other]: Title: An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

Comments: Accepted to NAACL 2024 Findings

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[22] arXiv:2404.07989 (cross-list from cs.CV) [pdf, other]: Title: Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

Authors: Yiwen Tang, Jiaming Liu, Dong Wang, Zhigang Wang, Shanghang Zhang, Bin Zhao, Xuelong Li

Comments: Code and models are released at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2404.07970 (cross-list from eess.AS) [pdf, other]: Title: Differentiable All-pole Filters for Time-varying Audio Systems

Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

Comments: Submitted to DAFx 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[24] arXiv:2404.07616 (cross-list from cs.CL) [pdf, other]: Title: Audio Dialogues: Dialogues dataset for audio and music understanding

Authors: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Comments: Demo website: this https URL

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2404.07341 (cross-list from eess.AS) [pdf, other]: Title: Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

Authors: Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[26] arXiv:2404.07226 (cross-list from eess.AS) [pdf, other]: Title: Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models

Authors: Alkis Koudounas, Flavio Giobergia

Comments: 2 pages

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Thu, 18 Apr 2024
Wed, 17 Apr 2024
Tue, 16 Apr 2024
Mon, 15 Apr 2024
Fri, 12 Apr 2024

[ total of 26 entries: 1-26 ]
[ showing up to 38 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Thu, 18 Apr 2024

Wed, 17 Apr 2024

Tue, 16 Apr 2024

Mon, 15 Apr 2024

Fri, 12 Apr 2024