Sound

Authors and titles for recent submissions, skipping first 84

[ total of 44 entries: 1-25 | 20-44 ]
[ showing 25 entries per page: fewer | more | all ]

Tue, 23 Apr 2024

[20] arXiv:2404.14063 [pdf, other]: Title: LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search

Authors: Jinyue Guo, Anna-Maria Christodoulou, Balint Laczko, Kyrre Glette

Comments: Accepted to GECCO 24 Companion

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[21] arXiv:2404.13914 [pdf, other]: Title: Audio Anti-Spoofing Detection: A Survey

Authors: Menglu Li, Yasaman Ahmadiadli, Xiao-Ping Zhang

Comments: submitted to ACM Computing Surveys

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[22] arXiv:2404.13892 [pdf, other]: Title: Retrieval-Augmented Audio Deepfake Detection

Authors: Zuheng Kang, Yayun He, Botao Zhao, Xiaoyang Qu, Junqing Peng, Jing Xiao, Jianzong Wang

Comments: Accepted by the 2024 International Conference on Multimedia Retrieval (ICMR 2024)

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23] arXiv:2404.13789 [pdf, other]: Title: Anchor-aware Deep Metric Learning for Audio-visual Retrieval

Authors: Donghuo Zeng, Yanan Wang, Kazushi Ikeda, Yi Yu

Comments: 9 pages, 5 figures. Accepted by ACM ICMR 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[24] arXiv:2404.13569 [pdf, other]: Title: Musical Word Embedding for Music Tagging and Retrieval

Authors: SeungHeon Doh, Jongpil Lee, Dasaem Jeong, Juhan Nam

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25] arXiv:2404.13568 [pdf, ps, other]: Title: Sparse Direction of Arrival Estimation Method Based on Vector Signal Reconstruction with a Single Vector Sensor

Authors: Jiabin Guo

Comments: 20 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2404.13551 [pdf, other]: Title: AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition

Authors: Kin Wai Lau, Yasar Abbas Ur Rehman, Lai-Man Po

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27] arXiv:2404.13509 [pdf, ps, other]: Title: MFHCA: Enhancing Speech Emotion Recognition Via Multi-Spatial Fusion and Hierarchical Cooperative Attention

Authors: Xinxin Jiao, Liejun Wang, Yinfeng Yu

Comments: Main paper (5 pages). Accepted for publication by ICME 2024

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[28] arXiv:2404.13428 [pdf, ps, other]: Title: Text-dependent Speaker Verification (TdSV) Challenge 2024: Challenge Evaluation Plan

Authors: Zeinali Hossein, Lee Kong Aik, Alam Jahangir, Burget Lukas

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[29] arXiv:2404.13358 [pdf, other]: Title: Music Consistency Models

Authors: Zhengcong Fei, Mingyuan Fan, Junshi Huang

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[30] arXiv:2404.13286 [pdf, other]: Title: Track Role Prediction of Single-Instrumental Sequences

Authors: Changheon Han, Suhyun Lee, Minsam Ko

Comments: ISMIR LBD 2023

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[31] arXiv:2404.13821 (cross-list from cs.HC) [pdf, other]: Title: Robotic Blended Sonification: Consequential Robot Sound as Creative Material for Human-Robot Interaction

Authors: Stine S. Johansen, Yanto Browning, Anthony Brumpton, Jared Donovan, Markus Rittenbruch

Comments: Paper accepted at ISEA 24, The 29th International Symposium on Electronic Art, Brisbane, Australia, 21-29 June 2024

Subjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32] arXiv:2404.13289 (cross-list from cs.CL) [pdf, other]: Title: Double Mixture: Towards Continual Event Detection from Speech

Authors: Jingqi Kang, Tongtong Wu, Jinming Zhao, Guitao Wang, Yinwei Wei, Hao Yang, Guilin Qi, Yuan-Fang Li, Gholamreza Haffari

Comments: The first two authors contributed equally to this work

Subjects: Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33] arXiv:2404.13140 (cross-list from quant-ph) [pdf, ps, other]: Title: Intro to Quantum Harmony: Chords in Superposition

Authors: Christopher Dobrian, Omar Costa Hamido

Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 22 Apr 2024

[34] arXiv:2404.13008 [pdf, other]: Title: Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

Authors: Mohammed Yousif, Jonat John Mathew, Huzaifa Pallan, Agamjeet Singh Padda, Syed Daniyal Shah, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35] arXiv:2404.12979 [pdf, other]: Title: TRNet: Two-level Refinement Network leveraging Speech Enhancement for Noise Robust Speech Emotion Recognition

Authors: Chengxin Chen, Pengyuan Zhang

Comments: 13 pages, 3 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36] arXiv:2404.12725 [pdf, other]: Title: Separate in the Speech Chain: Cross-Modal Conditional Audio-Visual Target Speech Extraction

Authors: Zhaoxi Mu, Xinyu Yang

Comments: Accepted by IJCAI 2024

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

Fri, 19 Apr 2024

[37] arXiv:2404.12132 [pdf, other]: Title: Enhancing Suicide Risk Assessment: A Speech-Based Automated Approach in Emergency Medicine

Authors: Shahin Amiriparian, Maurice Gerczuk, Justina Lutz, Wolfgang Strube, Irina Papazova, Alkomiet Hasan, Alexander Kathan, Björn W. Schuller

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[38] arXiv:2404.12077 [pdf, other]: Title: TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches

Authors: Rong Wang, Kun Sun

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[39] arXiv:2404.12062 [pdf, other]: Title: MIDGET: Music Conditioned 3D Dance Generation

Authors: Jinwu Wang, Wei Mao, Miaomiao Liu

Comments: 12 pages, 6 figures Published in AI 2023: Advances in Artificial Intelligence

Journal-ref: In Australasian Joint Conference on Artificial Intelligence (pp. 277-288). Singapore: Springer Nature Singapore 2023

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
[40] arXiv:2404.11976 [pdf, other]: Title: Large Language Models: From Notes to Musical Form

Authors: Lilac Atassi

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:2404.12299 (cross-list from cs.CL) [pdf, other]: Title: Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair

Authors: Yusuke Sakai, Mana Makinae, Hidetaka Kamigaito, Taro Watanabe

Comments: 23 pages, 9 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:2404.12251 (cross-list from cs.LG) [pdf, other]: Title: Dynamic Modality and View Selection for Multimodal Emotion Recognition with Missing Modalities

Authors: Luciana Trinkaus Menon, Luiz Carlos Ribeiro Neduziak, Jean Paul Barddal, Alessandro Lameiras Koerich, Alceu de Souza Britto Jr

Comments: 15 pages

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43] arXiv:2404.11938 (cross-list from cs.MM) [pdf, other]: Title: HyDiscGAN: A Hybrid Distributed cGAN for Audio-Visual Privacy Preservation in Multimodal Sentiment Analysis

Authors: Zhuojia Wu, Qi Zhang, Duoqian Miao, Kun Yi, Wei Fan, Liang Hu

Comments: 13 pages, IJCAI-2024

Subjects: Multimedia (cs.MM); Distributed, Parallel, and Cluster Computing (cs.DC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:2404.11619 (cross-list from eess.AS) [pdf, ps, other]: Title: Advancing Speech Translation: A Corpus of Mandarin-English Conversational Telephone Speech

Authors: Shannon Wotherspoon, William Hartmann, Matthew Snover

Comments: 2 pages

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

[ total of 44 entries: 1-25 | 20-44 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for recent submissions, skipping first 84

Tue, 23 Apr 2024

Mon, 22 Apr 2024

Fri, 19 Apr 2024