Sound

Authors and titles for recent submissions, skipping first 3

[ total of 37 entries: 1-25 | 4-28 | 29-37 ]
[ showing 25 entries per page: fewer | more | all ]

Thu, 16 May 2024 (continued, showing last 5 of 8 entries)

[4] arXiv:2405.09171 [pdf, other]: Title: Hierarchical Emotion Prediction and Control in Text-to-Speech Synthesis

Authors: Sho Inoue, Kun Zhou, Shuai Wang, Haizhou Li

Comments: This is accepted to IEEE ICASSP 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5] arXiv:2405.09062 [pdf, other]: Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion Models

Authors: Emilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo Akama

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:2405.08838 [pdf, other]: Title: PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset

Authors: Yang Hou, Haitao Fu, Chuankai Chen, Zida Li, Haoyu Zhang, Jianjun Zhao

Comments: 13 page, 4 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[7] arXiv:2405.09266 (cross-list from cs.CV) [pdf, other]: Title: Dance Any Beat: Blending Beats with Visuals in Dance Video Generation

Authors: Xuanchen Wang, Heng Wang, Dongnan Liu, Weidong Cai

Comments: 11 pages, 6 figures, demo page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2405.09142 (cross-list from eess.AS) [pdf, other]: Title: Speaker Embeddings With Weakly Supervised Voice Activity Detection For Efficient Speaker Diarization

Authors: Jenthe Thienpondt, Kris Demuynck

Comments: Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Wed, 15 May 2024

[9] arXiv:2405.08679 [pdf, other]: Title: Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning

Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

Comments: Self-supervision in Audio, Speech and Beyond workshop, IEEE International Conference on Acoustics, Speech, and Signal Processing, 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:2405.08596 [src]: Title: EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark

Authors: Xiaohui Zhang, Jiangyan Yi, Jianhua Tao

Comments: This paper need more modification

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2405.08342 [pdf, ps, other]: Title: Abnormal Respiratory Sound Identification Using Audio-Spectrogram Vision Transformer

Authors: Whenty Ariyanti, Kai-Chun Liu, Kuan-Yu Chen, Yu Tsao

Comments: Published in 2023 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Journal-ref: 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (2023) 1-4

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:2405.08021 [pdf, other]: Title: Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion

Authors: Zhao Ren, Kevin Scheck, Qinhan Hou, Stefano van Gogh, Michael Wand, Tanja Schultz

Comments: Accepted by EMBC 2024

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[13] arXiv:2405.08742 (cross-list from eess.AS) [pdf, ps, other]: Title: A tunable binaural audio telepresence system capable of balancing immersive and enhanced modes

Authors: Yicheng Hsu, Mingsian R. Bai

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14] arXiv:2405.08417 (cross-list from eess.AS) [pdf, other]: Title: Simple and Efficient Quantization Techniques for Neural Speech Coding

Authors: Andreas Brendel, Nicola Pia, Kishan Gupta, Guillaume Fuchs, Markus Multrus

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15] arXiv:2405.08317 (cross-list from cs.CL) [pdf, other]: Title: SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models

Authors: Raghuveer Peri, Sai Muralidhar Jayanthi, Srikanth Ronanki, Anshu Bhatia, Karel Mundnich, Saket Dingliwal, Nilaksh Das, Zejiang Hou, Goeric Huybrechts, Srikanth Vishnubhotla, Daniel Garcia-Romero, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Comments: 9+6 pages, Submitted to ACL 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2405.08295 (cross-list from cs.CL) [pdf, other]: Title: SpeechVerse: A Large-scale Generalizable Audio Language Model

Authors: Nilaksh Das, Saket Dingliwal, Srikanth Ronanki, Rohit Paturi, David Huang, Prashant Mathur, Jie Yuan, Dhanush Bekal, Xing Niu, Sai Muralidhar Jayanthi, Xilai Li, Karel Mundnich, Monica Sunkara, Sundararajan Srinivasan, Kyu J Han, Katrin Kirchhoff

Comments: Single Column, 13 page

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2405.08237 (cross-list from cs.CL) [pdf, other]: Title: A predictive learning model can simulate temporal dynamics and context effects found in neural representations of continuous speech

Authors: Oli Danyi Liu, Hao Tang, Naomi Feldman, Sharon Goldwater

Comments: Accepted to CogSci 2024

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2405.08096 (cross-list from eess.AS) [pdf, other]: Title: Semantic MIMO Systems for Speech-to-Text Transmission

Authors: Zhenzi Weng, Zhijin Qin, Huiqiang Xie, Xiaoming Tao, Khaled B. Letaief

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Tue, 14 May 2024 (showing first 10 of 13 entries)

[19] arXiv:2405.07682 [pdf, other]: Title: FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation

Authors: Jianyi Chen, Wei Xue, Xu Tan, Zhen Ye, Qifeng Liu, Yike Guo

Comments: IJCAI 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20] arXiv:2405.07442 [pdf, ps, other]: Title: Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases

Authors: Pengfei Zhang, Zhihang Zheng, Shichen Zhang, Minghao Yang, Shaojun Tang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[21] arXiv:2405.07354 [pdf, other]: Title: SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset

Authors: Sushant Gautam, Mehdi Houshmand Sarkhoosh, Jan Held, Cise Midoglu, Anthony Cioppa, Silvio Giancola, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[22] arXiv:2405.07034 [pdf, ps, other]: Title: Towards an Accessible and Rapidly Trainable Rhythm Sequencer Using a Generative Stacked Autoencoder

Authors: Alex Wastnidge

Comments: 7 pages, 7 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2405.07029 [pdf, ps, other]: Title: A framework of text-dependent speaker verification for chinese numerical string corpus

Authors: Litong Zheng, Feng Hong, Weijie Xu, Wan Zheng

Comments: arXiv admin note: text overlap with arXiv:2312.01645

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24] arXiv:2405.06995 [pdf, other]: Title: Benchmarking Cross-Domain Audio-Visual Deception Detection

Authors: Xiaobao Guo, Zitong Yu, Nithish Muthuchamy Selvaraj, Bingquan Shen, Adams Wai-Kin Kong, Alex C. Kot

Comments: 10 pages

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[25] arXiv:2405.06804 [pdf, other]: Title: Time-of-arrival Estimation and Phase Unwrapping of Head-related Transfer Functions With Integer Linear Programming

Authors: Chin-Yun Yu, Johan Pauwels, György Fazekas

Comments: Accepted to be presented at Audio Engineering Society 156th Convention, 2024 June, Madrid, Spain

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[26] arXiv:2405.06747 [pdf, other]: Title: Music Emotion Prediction Using Recurrent Neural Networks

Authors: Xinyu Chang, Xiangyu Zhang, Haoruo Zhang, Yulu Ran

Comments: 15 pages, 13 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:2405.07930 (cross-list from cs.MM) [pdf, other]: Title: Improving Multimodal Learning with Multi-Loss Gradient Modulation

Authors: Konstantinos Kontras, Christos Chatzichristos, Matthew Blaschko, Maarten De Vos

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[28] arXiv:2405.07700 (cross-list from cs.CL) [pdf, ps, other]: Title: Age-Dependent Analysis and Stochastic Generation of Child-Directed Speech

Authors: Okko Räsänen, Daniil Kocharov

Comments: Accepted for publication in Proc. 45th Annual Meeting of the Cognitive Science Society (CogSci-2024)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 37 entries: 1-25 | 4-28 | 29-37 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 3

Thu, 16 May 2024 (continued, showing last 5 of 8 entries)

Wed, 15 May 2024

Tue, 14 May 2024 (showing first 10 of 13 entries)