Sound

Authors and titles for cs.SD in Jan 2023

[ total of 104 entries: 1-104 ]
[ showing 104 entries per page: fewer | more ]

[1] arXiv:2301.00508 [pdf, other]: Title: EmoGator: A New Open Source Vocal Burst Dataset with Baseline Machine Learning Classification Methodologies

Authors: Fred W. Buhl

Comments: 12 pages, 4 tables, 2 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2301.01162 [pdf, other]: Title: Language Models are Drummers: Drum Composition with Natural Language Pre-Training

Authors: Li Zhang, Chris Callison-Burch

Comments: Accepted to the 1st workshop on Creative AI across Modalities in AAAI 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[3] arXiv:2301.01378 [pdf, other]: Title: An ensemble-based framework for mispronunciation detection of Arabic phonemes

Authors: Sukru Selim Calik, Ayhan Kucukmanisa, Zeynep Hilal Kilimci

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4] arXiv:2301.01578 [pdf, other]: Title: Validity in Music Information Research Experiments

Authors: Bob L. T. Sturm, Arthur Flexer

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5] arXiv:2301.02385 [pdf, other]: Title: Multi-Genre Music Transformer -- Composing Full Length Musical Piece

Authors: Abhinav Kaushal Keshari

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:2301.02732 [pdf, ps, other]: Title: Multimodal Lyrics-Rhythm Matching

Authors: Callie C. Liao, Duoduo Liao, Jesse Guessford

Comments: Accepted by 2022 IEEE International Conference on Big Data (IEEE Big Data 2022)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7] arXiv:2301.02884 [pdf, other]: Title: TunesFormer: Forming Irish Tunes with Control Codes by Bar Patching

Authors: Shangda Wu, Xiaobing Li, Feng Yu, Maosong Sun

Comments: 6 pages, 1 figure, 1 table, accepted by HCMIR 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:2301.02886 [pdf, other]: Title: Perceptual-Neural-Physical Sound Matching

Authors: Han Han, Vincent Lostanlen, Mathieu Lagrange

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9] arXiv:2301.03206 [pdf, other]: Title: Introducing Model Inversion Attacks on Automatic Speaker Recognition

Authors: Karla Pizzi, Franziska Boenisch, Ugur Sahin, Konstantin Böttinger

Comments: for associated pdf, see this https URL

Journal-ref: Proc. 2nd Symposium on Security and Privacy in Speech Communication, 2022

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:2301.03751 [pdf, other]: Title: Generative Emotional AI for Speech Emotion Recognition: The Case for Synthetic Emotional Speech Augmentation

Authors: Abdullah Shahid, Siddique Latif, Junaid Qadir

Comments: Under review

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11] arXiv:2301.03801 [pdf, other]: Title: UnifySpeech: A Unified Framework for Zero-shot Text-to-Speech and Voice Conversion

Authors: Haogeng Liu, Tao Wang, Ruibo Fu, Jiangyan Yi, Zhengqi Wen, Jianhua Tao

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12] arXiv:2301.04320 [pdf, other]: Title: Rethinking complex-valued deep neural networks for monaural speech enhancement

Authors: Haibin Wu, Ke Tan, Buye Xu, Anurag Kumar, Daniel Wong

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2301.04388 [pdf, other]: Title: Perceive and predict: self-supervised speech representation based loss functions for speech enhancement

Authors: George Close, William Ravenscroft, Thomas Hain, Stefan Goetze

Comments: 4 pages, accepted at ICASSP 2023

Journal-ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14] arXiv:2301.04488 [pdf, other]: Title: WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning

Authors: Kejun Zhang, Xinda Wu, Tieyao Zhang, Zhijie Huang, Xu Tan, Qihao Liang, Songruoyao Wu, Lingyun Sun

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:2301.05898 [pdf, ps, other]: Title: Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope

Authors: Yuran Zhang, Jiajie Zou, Nai Ding

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[16] arXiv:2301.05908 [pdf, other]: Title: An Order-Complexity Model for Aesthetic Quality Assessment of Symbolic Homophony Music Scores

Authors: Xin Jin, Wu Zhou, Jinyu Wang, Duo Xu, Yiqing Rong, Shuai Cui

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[17] arXiv:2301.06078 [pdf, ps, other]: Title: Training one model to detect heart and lung sound events from single point auscultations

Authors: Leander Melms, Robert R. Ilesan, Ulrich Köhler, Olaf Hildebrandt, Regina Conradt, Jens Eckstein, Cihan Atila, Sami Matrood, Bernhard Schieffer, Jürgen R. Schaefer, Tobias Müller, Julius Obergassel, Nadine Schlicker, Martin C. Hirsch

Comments: 14 pages, 8 figures

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2301.06211 [pdf, ps, other]: Title: What artificial intelligence might teach us about the origin of human language

Authors: Alexander Kilpatrick

Comments: ICPHS2023 Conference Submission. 5 pages

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19] arXiv:2301.06277 [pdf, ps, other]: Title: Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

Authors: Kai Liu, Xucheng Wan, Ziqing Du, Huan Zhou

Comments: ACCEPTED by NCMMSC 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[20] arXiv:2301.06468 [pdf, other]: Title: Msanii: High Fidelity Music Synthesis on a Shoestring Budget

Authors: Kinyugo Maina

Comments: 15 pages, 8 figures, for demo see this https URL and for code, see this https URL, this paper is a work in progress

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:2301.06735 [pdf, other]: Title: Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

Authors: Zhanheng Yang, Sining Sun, Xiong Wang, Yike Zhang, Long Ma, Lei Xie

Comments: accepted by interspeech 2023

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[22] arXiv:2301.07491 [pdf, ps, other]: Title: The Newsbridge -Telecom SudParis VoxCeleb Speaker Recognition Challenge 2022 System Description

Authors: Yannis Tevissen (ARMEDIA-SAMOVAR), Jérôme Boudy (ARMEDIA-SAMOVAR), Frédéric Petitpont

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[23] arXiv:2301.07665 [pdf, other]: Title: An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms

Authors: Anastasia Natsiou, Luca Longo, Sean O'Leary

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2301.07851 [pdf, other]: Title: From English to More Languages: Parameter-Efficient Model Reprogramming for Cross-Lingual Speech Recognition

Authors: Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen, Rohit Prabhavalkar, Tara N. Sainath, Trevor Strohman

Comments: Submitted to ICASSP 2023. The project was initiated in May 2022 during a research internship at Google Research

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[25] arXiv:2301.07939 [pdf, other]: Title: THLNet: two-stage heterogeneous lightweight network for monaural speech enhancement

Authors: Feng Dang, Qi Hu, Pengyuan Zhang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:2301.07978 [pdf, other]: Title: SpotHitPy: A Study For ML-Based Song Hit Prediction Using Spotify

Authors: Ioannis Dimolitsas, Spyridon Kantarelis, Afroditi Fouka

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:2301.08620 [pdf, other]: Title: Adjoint-Based Identification of Sound Sources for Sound Reinforcement and Source Localization

Authors: Mathias Lemke, Lewin Stein

Journal-ref: Notes on Numerical Fluid Mechanics and Multidisciplinary Design, vol 145. Springer (2021)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Fluid Dynamics (physics.flu-dyn)
[28] arXiv:2301.09027 [pdf, other]: Title: Cellular Network Speech Enhancement: Removing Background and Transmission Noise

Authors: Amanda Shu, Hamza Khalid, Haohui Liu, Shikhar Agnihotri, Joseph Konan, Ojas Bhargave

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29] arXiv:2301.09362 [pdf, other]: Title: A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era

Authors: Zhao Ren, Yi Chang, Thanh Tam Nguyen, Yang Tan, Kun Qian, Björn W. Schuller

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30] arXiv:2301.10015 [pdf, other]: Title: Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics

Authors: Gurunath Reddy M, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang

Comments: arXiv admin note: substantial text overlap with arXiv:2011.06380

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[31] arXiv:2301.10183 [pdf, other]: Title: Mesostructures: Beyond Spectrogram Loss in Differentiable Time-Frequency Analysis

Authors: Cyrus Vahidi, Han Han, Changhong Wang, Mathieu Lagrange, György Fazekas, Vincent Lostanlen

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32] arXiv:2301.10335 [pdf, other]: Title: Multilingual Multiaccented Multispeaker TTS with RADTTS

Authors: Rohan Badlani, Rafael Valle, Kevin J. Shih, João Felipe Santos, Siddharth Gururani, Bryan Catanzaro

Comments: 5 pages, submitted to ICASSP 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[33] arXiv:2301.10477 [pdf, other]: Title: HEAR4Health: A blueprint for making computer audition a staple of modern healthcare

Authors: Andreas Triantafyllopoulos, Alexander Kathan, Alice Baird, Lukas Christ, Alexander Gebhard, Maurice Gerczuk, Vincent Karas, Tobias Hübner, Xin Jing, Shuo Liu, Adria Mallol-Ragolta, Manuel Milling, Sandra Ottl, Anastasia Semertzidou, Srividya Tirunellai Rajamani, Tianhao Yan, Zijiang Yang, Judith Dineley, Shahin Amiriparian, Katrin D. Bartl-Pokorny, Anton Batliner, Florian B. Pokorny, Björn W. Schuller

Subjects: Sound (cs.SD); Computers and Society (cs.CY); Audio and Speech Processing (eess.AS)
[34] arXiv:2301.10587 [pdf, other]: Title: On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

Authors: Philippe Gonzalez, Tommy Sonne Alstrøm, Tobias May

Comments: Accepted to ICASSP 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35] arXiv:2301.11325 [pdf, other]: Title: MusicLM: Generating Music From Text

Authors: Andrea Agostinelli, Timo I. Denk, Zalán Borsos, Jesse Engel, Mauro Verzetti, Antoine Caillon, Qingqing Huang, Aren Jansen, Adam Roberts, Marco Tagliasacchi, Matt Sharifi, Neil Zeghidour, Christian Frank

Comments: Supplementary material at this https URL and this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36] arXiv:2301.12084 [pdf, other]: Title: Automated Arrangements of Multi-Part Music for Sets of Monophonic Instruments

Authors: Matthew Mccloskey, Gabrielle Curcio, Amulya Badineni, Kevin Mcgrath, Dimitris Papamichail

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[37] arXiv:2301.12209 [pdf, other]: Title: who is snoring? snore based user recognition

Authors: Shenghao Li, Jagmohan Chauhan

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:2301.12343 [pdf, other]: Title: Achieving Timestamp Prediction While Recognizing with Non-Autoregressive End-to-End ASR Model

Authors: Xian Shi, Yanni Chen, Shiliang Zhang, Zhijie Yan

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[39] arXiv:2301.12354 [pdf, other]: Title: Artistic Curve Steganography Carried by Musical Audio

Authors: Christopher J. Tralie

Comments: 18 pages, 14 figures, in Proceedings of EvoMUSART 2023

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[40] arXiv:2301.12503 [pdf, other]: Title: AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

Authors: Haohe Liu, Zehua Chen, Yi Yuan, Xinhao Mei, Xubo Liu, Danilo Mandic, Wenwu Wang, Mark D. Plumbley

Comments: Accepted by ICML 2023. Demo and implementation at this https URL Evaluation toolbox at this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[41] arXiv:2301.12525 [pdf, other]: Title: Composer's Assistant: An Interactive Transformer for Multi-Track MIDI Infilling

Authors: Martin E. Malandro

Comments: 12 pages, 6 figures, 3 tables. To be published in ISMIR 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42] arXiv:2301.12661 [pdf, other]: Title: Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models

Authors: Rongjie Huang, Jiawei Huang, Dongchao Yang, Yi Ren, Luping Liu, Mingze Li, Zhenhui Ye, Jinglin Liu, Xiang Yin, Zhou Zhao

Comments: Audio samples are available at this https URL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[43] arXiv:2301.12662 [pdf, other]: Title: SingSong: Generating musical accompaniments from singing

Authors: Chris Donahue, Antoine Caillon, Adam Roberts, Ethan Manilow, Philippe Esling, Andrea Agostinelli, Mauro Verzetti, Ian Simon, Olivier Pietquin, Neil Zeghidour, Jesse Engel

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[44] arXiv:2301.13267 [pdf, ps, other]: Title: ArchiSound: Audio Generation with Diffusion

Authors: Flavio Schneider

Comments: Master Thesis at ETH Zurich

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[45] arXiv:2301.13380 [pdf, other]: Title: Automated Time-frequency Domain Audio Crossfades using Graph Cuts

Authors: Kyle Robinson, Dan Brown

Journal-ref: Late Breaking/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[46] arXiv:2301.13383 [pdf, other]: Title: An Comparative Analysis of Different Pitch and Metrical Grid Encoding Methods in the Task of Sequential Music Generation

Authors: Yuqiang Li, Shengchen Li, George Fazekas

Comments: This is a draft before submitted to TISMIR as a journal paper

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47] arXiv:2301.13662 [pdf, other]: Title: InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt

Authors: Dongchao Yang, Songxiang Liu, Rongjie Huang, Chao Weng, Helen Meng

Comments: Submit to TASLP

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48] arXiv:2301.00142 (cross-list from cs.HC) [pdf, other]: Title: Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

Authors: Björn W. Schuller, Shahin Amiriparian, Anton Batliner, Alexander Gebhard, Maurice Gerzcuk, Vincent Karas, Alexander Kathan, Lennart Seizer, Johanna Löchner

Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:2301.00304 (cross-list from cs.CL) [pdf, other]: Title: Sample-Efficient Unsupervised Domain Adaptation of Speech Recognition Systems A case study for Modern Greek

Authors: Georgios Paraskevopoulos, Theodoros Kouzelis, Georgios Rouvalis, Athanasios Katsamanis, Vassilis Katsouros, Alexandros Potamianos

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:2301.00591 (cross-list from cs.CL) [pdf, other]: Title: Analysing Discrete Self Supervised Speech Representation for Spoken Language Modeling

Authors: Amitay Sicherman, Yossi Adi

Comments: Accepted at ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51] arXiv:2301.01020 (cross-list from cs.CL) [pdf, other]: Title: Supervised Acoustic Embeddings And Their Transferability Across Languages

Authors: Sreepratha Ram, Hanan Aldarmaki

Comments: Presented at ICNLSP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52] arXiv:2301.01456 (cross-list from cs.CV) [pdf, other]: Title: Audio-Visual Efficient Conformer for Robust Speech Recognition

Authors: Maxime Burchi, Radu Timofte

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53] arXiv:2301.02111 (cross-list from cs.CL) [pdf, other]: Title: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

Authors: Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

Comments: Working in progress

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54] arXiv:2301.02184 (cross-list from cs.CV) [pdf, other]: Title: Chat2Map: Efficient Scene Mapping from Multi-Ego Conversations

Authors: Sagnik Majumder, Hao Jiang, Pierre Moulon, Ethan Henderson, Paul Calamia, Kristen Grauman, Vamsi Krishna Ithapu

Comments: Accepted to CVPR 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55] arXiv:2301.03238 (cross-list from cs.CL) [pdf, ps, other]: Title: MAQA: A Multimodal QA Benchmark for Negation

Authors: Judith Yue Li, Aren Jansen, Qingqing Huang, Joonseok Lee, Ravi Ganti, Dima Kuzmin

Comments: NeurIPS 2022 SyntheticData4ML Workshop

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56] arXiv:2301.04474 (cross-list from cs.CV) [pdf, other]: Title: Speech Driven Video Editing via an Audio-Conditioned Diffusion Model

Authors: Dan Bigioi, Shubhajit Basak, Michał Stypułkowski, Maciej Zięba, Hugh Jordan, Rachel McDonnell, Peter Corcoran

Comments: 8 Pages, code and project page available here: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57] arXiv:2301.06267 (cross-list from cs.CV) [pdf, other]: Title: Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models

Authors: Zhiqiu Lin, Samuel Yu, Zhiyi Kuang, Deepak Pathak, Deva Ramanan

Comments: CVPR 2023. Project website: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[58] arXiv:2301.06375 (cross-list from cs.MM) [pdf, other]: Title: OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

Authors: Jeongkyun Park, Jung-Wook Hwang, Kwanghee Choi, Seung-Hyun Lee, Jun Hwan Ahn, Rae-Hong Park, Hyung-Min Park

Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[59] arXiv:2301.06475 (cross-list from cs.CL) [pdf, ps, other]: Title: Using Kaldi for Automatic Speech Recognition of Conversational Austrian German

Authors: Julian Linke, Saskia Wepner, Gernot Kubin, Barbara Schuppler

Comments: 10 pages, 2 figures, 4 tables

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60] arXiv:2301.06916 (cross-list from cs.CL) [pdf, other]: Title: Automated speech- and text-based classification of neuropsychiatric conditions in a multidiagnostic setting

Authors: Lasse Hansen, Roberta Rocca, Arndis Simonsen, Alberto Parola, Vibeke Bliksted, Nicolai Ladegaard, Dan Bang, Kristian Tylén, Ethan Weed, Søren Dinesen Østergaard, Riccardo Fusaroli

Comments: 24 pages, 5 figures

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Applications (stat.AP)
[61] arXiv:2301.07087 (cross-list from cs.CL) [pdf, other]: Title: MooseNet: A Trainable Metric for Synthesized Speech with a PLDA Module

Authors: Ondřej Plátek, Ondřej Dušek

Comments: Accepted to SSW 12: this https URL

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[62] arXiv:2301.07829 (cross-list from cs.HC) [pdf, other]: Title: Warning: Humans Cannot Reliably Detect Speech Deepfakes

Authors: Kimberly T. Mai, Sergi D. Bray, Toby Davies, Lewis D. Griffin

Journal-ref: PLoS ONE 18(8) (2023): e0285333

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63] arXiv:2301.08145 (cross-list from cs.IR) [pdf, other]: Title: Music Playlist Title Generation Using Artist Information

Authors: Haven Kim, SeungHeon Doh, Junwon Lee, Juhan Nam

Comments: AAAI-23 Workshop on Creative AI Across Modalities

Subjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64] arXiv:2301.08562 (cross-list from cs.LG) [pdf, other]: Title: Latent Autoregressive Source Separation

Authors: Emilian Postolache, Giorgio Mariani, Michele Mancusi, Andrea Santilli, Luca Cosmo, Emanuele Rodolà

Comments: Accepted to AAAI 2023

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65] arXiv:2301.08730 (cross-list from cs.CV) [pdf, other]: Title: Novel-View Acoustic Synthesis

Authors: Changan Chen, Alexander Richard, Roman Shapovalov, Vamsi Krishna Ithapu, Natalia Neverova, Kristen Grauman, Andrea Vedaldi

Comments: Accepted at CVPR 2023. Project page: this https URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66] arXiv:2301.08810 (cross-list from cs.CL) [pdf, other]: Title: Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

Authors: Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67] arXiv:2301.09080 (cross-list from cs.MM) [pdf, other]: Title: Dance2MIDI: Dance-driven multi-instruments music generation

Authors: Bo Han, Yuheng Li, Yixuan Shen, Yi Ren, Feilin Han

Comments: has been accepted by Computational Visual Media Journal

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:2301.09099 (cross-list from cs.CL) [pdf, ps, other]: Title: Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study

Authors: Massa Baali, Tomoki Hayashi, Hamdy Mubarak, Soumi Maiti, Shinji Watanabe, Wassim El-Hajj, Ahmed Ali

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69] arXiv:2301.10047 (cross-list from cs.GR) [pdf, other]: Title: DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model

Authors: Fan Zhang, Naye Ji, Fuxing Gao, Yongping Li

Comments: 13 pages, 3 figures

Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[70] arXiv:2301.10056 (cross-list from cs.CR) [pdf, ps, other]: Title: Side Eye: Characterizing the Limits of POV Acoustic Eavesdropping from Smartphone Cameras with Rolling Shutters and Movable Lenses

Authors: Yan Long, Pirouz Naghavi, Blas Kojusner, Kevin Butler, Sara Rampazzi, Kevin Fu

Journal-ref: 2023 IEEE Symposium on Security and Privacy

Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71] arXiv:2301.10180 (cross-list from cs.CL) [pdf, other]: Title: A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset

Authors: Javad Peymanfard, Samin Heydarian, Ali Lashini, Hossein Zeinali, Mohammad Reza Mohammadi, Nasser Mozayani

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72] arXiv:2301.10295 (cross-list from cs.CV) [pdf, other]: Title: Object Segmentation with Audio Context

Authors: Kaihui Zheng, Yuqing Ren, Zixin Shen, Tianxu Qin

Comments: Research project for Introduction to Deep Learning (11785) at Carnegie Mellon University

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2301.10314 (cross-list from cs.HC) [pdf, other]: Title: WhisperWand: Simultaneous Voice and Gesture Tracking Interface

Authors: Yang Bai, Irtaza Shahid, Harshvardhan Takawale, Nirupam Roy

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2301.10606 (cross-list from cs.CL) [pdf, other]: Title: A Holistic Cascade System, benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation

Authors: Wen-Chin Huang, Benjamin Peloquin, Justine Kao, Changhan Wang, Hongyu Gong, Elizabeth Salesky, Yossi Adi, Ann Lee, Peng-Jen Chen

Comments: This is the full version of our submission to ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75] arXiv:2301.11716 (cross-list from cs.CL) [pdf, other]: Title: Pre-training for Speech Translation: CTC Meets Optimal Transport

Authors: Phuong-Hang Le, Hongyu Gong, Changhan Wang, Juan Pino, Benjamin Lecouteux, Didier Schwab

Comments: ICML 2023 (oral presentation). This version fixed URLs, updated affiliations & acknowledgements, and improved formatting

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[76] arXiv:2301.11757 (cross-list from cs.CL) [pdf, other]: Title: Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion

Authors: Flavio Schneider, Ojasv Kamal, Zhijing Jin, Bernhard Schölkopf

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[77] arXiv:2301.11975 (cross-list from cs.LG) [pdf, other]: Title: Byte Pair Encoding for Symbolic Music

Authors: Nathan Fradet, Nicolas Gutowski, Fabien Chhel, Jean-Pierre Briot

Comments: EMNLP 2023, source code: this https URL

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:2301.12331 (cross-list from cs.CL) [pdf, other]: Title: Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker

Authors: Navjot Kaur, Paige Tuttosi

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2301.12686 (cross-list from cs.LG) [pdf, other]: Title: GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration

Authors: Naoki Murata, Koichi Saito, Chieh-Hsin Lai, Yuhta Takida, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2301.13003 (cross-list from cs.CL) [pdf, other]: Title: Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation

Authors: Minglun Han, Feilong Chen, Jing Shi, Shuang Xu, Bo Xu

Comments: Accepted by INTERSPEECH 2023

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2301.13507 (cross-list from cs.IR) [pdf, ps, other]: Title: An Analysis of Classification Approaches for Hit Song Prediction using Engineered Metadata Features with Lyrics and Audio Features

Authors: Mengyisong Zhao, Morgan Harvey, David Cameron, Frank Hopfgartner, Valerie J. Gillet

Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2301.00448 (cross-list from eess.AS) [pdf, other]: Title: Unsupervised Acoustic Scene Mapping Based on Acoustic Features and Dimensionality Reduction

Authors: Idan Cohen, Ofir Lindenbaum, Sharon Gannot

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[83] arXiv:2301.00646 (cross-list from eess.AS) [pdf, other]: Title: Addressing the Selection Bias in Voice Assistance: Training Voice Assistance Model in Python with Equal Data Selection

Authors: Kashav Piya, Srijal Shrestha, Cameran Frank, Estephanos Jebessa, Tauheed Khan Mohd

Subjects: Audio and Speech Processing (eess.AS); Multiagent Systems (cs.MA); Robotics (cs.RO); Sound (cs.SD)
[84] arXiv:2301.00833 (cross-list from eess.AS) [pdf, other]: Title: Hyperuniform disordered parametric loudspeaker array

Authors: Kun Tang, Yuqi Wang, Shaobo Wang, Da Gao, Haojie Li, Xindong Liang, Patrick Sebbah, Yibin Li, Jin Zhang, Junhui Shi

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Applied Physics (physics.app-ph)
[85] arXiv:2301.01361 (cross-list from eess.AS) [pdf, other]: Title: Modeling the Rhythm from Lyrics for Melody Generation of Pop Song

Authors: Daiyu Zhang, Ju-Chiang Wang, Katerina Kosta, Jordan B. L. Smith, Shicen Zhou

Comments: Published in ISMIR 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[86] arXiv:2301.01595 (cross-list from quant-ph) [pdf, other]: Title: Quantum Representations of Sound: from mechanical waves to quantum circuits

Authors: Paulo V. Itaborai, Eduardo R. Miranda

Comments: 29 pages,26 figures. Accompanying Python package is available: this https URL

Subjects: Quantum Physics (quant-ph); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2301.02214 (cross-list from eess.AS) [pdf, other]: Title: Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks

Authors: Zifan Jiang, Adrian Soldati, Isaac Schamberg, Adriano R. Lameira, Steven Moran

Comments: This paper is published as: Jiang, Zifan, Adrian Soldati, Isaac Schamberg, Adriano R. Lameira and Steven Moran. Automatic Sound Event Detection and Classification of Great Ape Calls Using Neural Networks. In Proceedings of the 20th International Congress of Phonetic Sciences (ICPhS 2023), 3100-3104, Prague, Czech Republic

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[88] arXiv:2301.02262 (cross-list from eess.AS) [pdf, other]: Title: Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

Authors: Miku Nishihara, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[89] arXiv:2301.02736 (cross-list from eess.AS) [pdf, other]: Title: Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

Authors: David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[90] arXiv:2301.04606 (cross-list from eess.AS) [pdf, other]: Title: Modelling low-resource accents without accent-specific TTS frontend

Authors: Georgi Tinchev, Marta Czarnowska, Kamil Deja, Kayoko Yanagisawa, Marius Cotescu

Comments: The first two authors contributed equally to this work. In Review. Samples available on this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[91] arXiv:2301.05025 (cross-list from math.HO) [pdf, other]: Title: Topological data analysis hearing the shapes of drums and bells

Authors: Guo-Wei Wei

Comments: 4 pages, 2 figures

Subjects: History and Overview (math.HO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2301.05295 (cross-list from eess.AS) [pdf, other]: Title: Rock Guitar Tablature Generation via Natural Language Processing

Authors: Josue Casco-Rodriguez

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[93] arXiv:2301.05868 (cross-list from eess.AS) [pdf, other]: Title: Modulation spectral features for speech emotion recognition using deep neural networks

Authors: Premjeet Singh, Md Sahidullah, Goutam Saha

Comments: Accepted for publication in Elsevier's Speech Communication Journal

Journal-ref: Volume 146, January 2023, Pages 53-69

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94] arXiv:2301.06458 (cross-list from eess.AS) [pdf, other]: Title: Multi-resolution location-based training for multi-channel continuous speech separation

Authors: Hassan Taherian, DeLiang Wang

Comments: Submitted to ICASSP 23

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[95] arXiv:2301.07173 (cross-list from eess.AS) [pdf, other]: Title: Towards Voice Reconstruction from EEG during Imagined Speech

Authors: Young-Eun Lee, Seo-Hyun Lee, Sang-Ho Kim, Seong-Whan Lee

Comments: 9 pages, 4 figures, accepted paper of AAAI 2023 in main track

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD); Signal Processing (eess.SP)
[96] arXiv:2301.08925 (cross-list from eess.AS) [pdf, other]: Title: New Challenges for Content Privacy in Speech and Audio

Authors: Jennifer Williams, Karla Pizzi, Shuvayanti Das, Paul-Gauthier Noe

Comments: Accepted for publication in ISCA SPSC Symposium 2022

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[97] arXiv:2301.09198 (cross-list from eess.AS) [pdf, other]: Title: Estimation of Source and Receiver Positions, Room Geometry and Reflection Coefficients From a Single Room Impulse Response

Authors: Wangyang Yu, W. Bastiaan Kleijn

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[98] arXiv:2301.10210 (cross-list from eess.AS) [pdf, ps, other]: Title: Perceptual evaluation of listener envelopment using spatial granular synthesis

Authors: Stefan Riedel, Matthias Frank, Franz Zotter

Comments: Submitted to the Journal of the Audio Engineering Society (JAES)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[99] arXiv:2301.11176 (cross-list from eess.AS) [pdf, ps, other]: Title: A simple model for pink noise from amplitude modulations

Authors: Masahiro Morikawa, Akika Nakamichi

Comments: 12 pages, 9 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Classical Physics (physics.class-ph)
[100] arXiv:2301.11276 (cross-list from eess.AS) [pdf, other]: Title: BayesSpeech: A Bayesian Transformer Network for Automatic Speech Recognition

Authors: Will Rieger

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[101] arXiv:2301.11446 (cross-list from eess.AS) [pdf, other]: Title: On granularity of prosodic representations in expressive text-to-speech

Authors: Mikolaj Babianski, Kamil Pokora, Raahil Shah, Rafal Sienkiewicz, Daniel Korzekwa, Viacheslav Klimkov

Comments: Accepted to IEEE SLT 2022

Journal-ref: 2022 IEEE Spoken Language Technology Workshop (SLT), pp. 892-899

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[102] arXiv:2301.12258 (cross-list from eess.AS) [pdf, other]: Title: Cross-domain Neural Pitch and Periodicity Estimation

Authors: Max Morrison, Caedon Hsieh, Nathan Pruyne, Bryan Pardo

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[103] arXiv:2301.12363 (cross-list from eess.AS) [pdf, other]: Title: NeuralKalman: A Learnable Kalman Filter for Acoustic Echo Cancellation

Authors: Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang

Comments: The term of the algorithm is renamed because it conflicts with an existing KalmanNet algorithm proposed by Revach et. al. (arXiv:2107.10043); Accepted by ASRU 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[104] arXiv:2301.13341 (cross-list from eess.AS) [pdf, other]: Title: Neural Target Speech Extraction: An Overview

Authors: Katerina Zmolikova, Marc Delcroix, Tsubasa Ochiai, Keisuke Kinoshita, Jan Černocký, Dong Yu

Comments: Submitted to IEEE Signal Processing Magazine on Apr. 25, 2022, and accepted on Jan. 12, 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

[ total of 104 entries: 1-104 ]
[ showing 104 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Jan 2023