Sound

Authors and titles for recent submissions

Tue, 16 Apr 2024
Mon, 15 Apr 2024
Fri, 12 Apr 2024
Thu, 11 Apr 2024
Wed, 10 Apr 2024

[ total of 27 entries: 1-27 ]
[ showing up to 66 entries per page: fewer | more ]

Tue, 16 Apr 2024

[1] arXiv:2404.09956 [pdf, other]: Title: Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization

Authors: Navonil Majumder, Chia-Yu Hung, Deepanway Ghosal, Wei-Ning Hsu, Rada Mihalcea, Soujanya Poria

Comments: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[2] arXiv:2404.09466 [pdf, other]: Title: Scoring Intervals using Non-hierarchical Transformer For Automatic Piano Transcription

Authors: Yujia Yan, Zhiyao Duan

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2404.09192 [pdf, other]: Title: Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

Authors: Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[4] arXiv:2404.09177 [pdf, other]: Title: An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging

Authors: Gabriel Meseguer-Brocal, Dorian Desblancs, Romain Hennequin

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:2404.08857 [pdf, other]: Title: Voice Attribute Editing with Text Prompt

Authors: Zhengyan Sheng, Yang Ai, Li-Juan Liu, Jia Pan, Zhen-Hua Ling

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6] arXiv:2404.09841 (cross-list from eess.AS) [pdf, other]: Title: Anatomy of Industrial Scale Multilingual ASR

Authors: Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Efty, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[7] arXiv:2404.09342 (cross-list from cs.CV) [pdf, other]: Title: Face-voice Association in Multilingual Environments (FAME) Challenge 2024 Evaluation Plan

Authors: Muhammad Saad Saeed, Shah Nawaz, Muhammad Salman Tahir, Rohan Kumar Das, Muhammad Zaigham Zaheer, Marta Moscati, Markus Schedl, Muhammad Haris Khan, Karthik Nandakumar, Muhammad Haroon Yousaf

Comments: ACM Multimedia Conference - Grand Challenge

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2404.08813 (cross-list from cs.HC) [pdf, other]: Title: Interactive Sonification for Health and Energy using ChucK and Unity

Authors: Yichun Zhao, George Tzanetakis

Comments: In the Proceedings of the Conference on Sonification of Health and Environmental Data (SoniHED 2022). this http URL

Journal-ref: Conference on Sonification of Health and Environmental Data (SoniHED 2022)

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 15 Apr 2024

[9] arXiv:2404.08022 [pdf, other]: Title: A lightweight dual-stage framework for personalized speech enhancement based on DeepFilterNet2

Authors: Thomas Serre (S2A, IDS), Mathieu Fontaine (S2A, IDS), Éric Benhaim, Geoffroy Dutour, Slim Essid (S2A, IDS)

Comments: Accepted at HSCMA24, Satellite workshop of ICASSP24

Journal-ref: ICASSP, Apr 2024, Seoul (Korea), South Korea

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 12 Apr 2024

[10] arXiv:2404.07575 [pdf, ps, other]: Title: An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution

Authors: Tien-Hong Lo, Fu-An Chao, Tzu-I Wu, Yao-Ting Sung, Berlin Chen

Comments: Accepted to NAACL 2024 Findings

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[11] arXiv:2404.07989 (cross-list from cs.CV) [pdf, other]: Title: Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding

Authors: Yiwen Tang, Jiaming Liu, Dong Wang, Zhigang Wang, Shanghang Zhang, Bin Zhao, Xuelong Li

Comments: Code and models are released at this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12] arXiv:2404.07970 (cross-list from eess.AS) [pdf, other]: Title: Differentiable All-pole Filters for Time-varying Audio Systems

Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas

Comments: Submitted to DAFx 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2404.07616 (cross-list from cs.CL) [pdf, other]: Title: Audio Dialogues: Dialogues dataset for audio and music understanding

Authors: Arushi Goel, Zhifeng Kong, Rafael Valle, Bryan Catanzaro

Comments: Demo website: this https URL

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:2404.07341 (cross-list from eess.AS) [pdf, other]: Title: Conformer-1: Robust ASR via Large-Scale Semisupervised Bootstrapping

Authors: Kevin Zhang, Luka Chkhetiani, Francis McCann Ramirez, Yash Khare, Andrea Vanzo, Michael Liang, Sergio Ramirez Martin, Gabriel Oexle, Ruben Bousbib, Taufiquzzaman Peyash, Michael Nguyen, Dillon Pulliam, Domenic Donato

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[15] arXiv:2404.07226 (cross-list from eess.AS) [pdf, other]: Title: Houston we have a Divergence: A Subgroup Performance Analysis of ASR Models

Authors: Alkis Koudounas, Flavio Giobergia

Comments: 2 pages

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

Thu, 11 Apr 2024

[16] arXiv:2404.06682 [pdf, other]: Title: Learning Multidimensional Disentangled Representations of Instrumental Sounds for Musical Similarity Assessment

Authors: Yuka Hashizume, Li Li, Atsushi Miyashita, Tomoki Toda

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2404.06674 [pdf, other]: Title: VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

Authors: Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18] arXiv:2404.06928 (cross-list from eess.AS) [pdf, other]: Title: Efficient Sound Field Reconstruction with Conditional Invertible Neural Networks

Authors: Xenofon Karakonstantis, Efren Fernandez-Grande, Peter Gerstoft

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19] arXiv:2404.06818 (cross-list from eess.AS) [pdf, other]: Title: Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

Authors: Taegyun Kwon, Dasaem Jeong, Juhan Nam

Comments: 11 pages, 8 figures, preprint

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[20] arXiv:2404.06714 (cross-list from cs.CL) [pdf, other]: Title: Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness

Authors: Xincan Feng, Akifumi Yoshimoto

Comments: 9 pages, 2 figures, 4 tables; accepted at LREC-COLING 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21] arXiv:2404.06702 (cross-list from eess.AS) [pdf, other]: Title: What is Learnt by the LEArnable Front-end (LEAF)? Adapting Per-Channel Energy Normalisation (PCEN) to Noisy Conditions

Authors: Hanyu Meng, Vidhyasaharan Sethu, Eliathamby Ambikairajah

Comments: Interspeech 2023 Proceeding

Journal-ref: Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[22] arXiv:2404.06690 (cross-list from eess.AS) [pdf, other]: Title: CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Authors: Leying Zhang, Yao Qian, Long Zhou, Shujie Liu, Dongmei Wang, Xiaofei Wang, Midia Yousefi, Yanmin Qian, Jinyu Li, Lei He, Sheng Zhao, Michael Zeng

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Wed, 10 Apr 2024

[23] arXiv:2404.06393 [pdf, other]: Title: MuPT: A Generative Symbolic Music Pretrained Transformer

Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan, Stephen W. Huang, Wenhu Chen, Jie Fu, Ge Zhang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.06103 [pdf, other]: Title: Exploring Diverse Sounds: Identifying Outliers in a Music Corpus

Authors: Le Cai, Sam Ferguson, Gengfa Fang, Hani Alshamrani

Journal-ref: The 16th International Symposium on Computer Music Multidisciplinary Research,2023

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[25] arXiv:2404.05765 [pdf, ps, other]: Title: A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music

Authors: Roopa Mayya, Vivekanand Venkataraman, Anwesh P R, Narayana Darapaneni

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[26] arXiv:2404.06292 (cross-list from cs.CL) [pdf, other]: Title: nEMO: Dataset of Emotional Speech in Polish

Authors: Iwona Christop

Comments: Accepted for LREC-Coling 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2404.06095 (cross-list from eess.AS) [pdf, other]: Title: Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

Authors: Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino

Comments: 15 pages, 6 figures, 15 tables. Accepted by TASLP

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Tue, 16 Apr 2024
Mon, 15 Apr 2024
Fri, 12 Apr 2024
Thu, 11 Apr 2024
Wed, 10 Apr 2024

[ total of 27 entries: 1-27 ]
[ showing up to 66 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions

Tue, 16 Apr 2024

Mon, 15 Apr 2024

Fri, 12 Apr 2024

Thu, 11 Apr 2024

Wed, 10 Apr 2024