We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions, skipping first 26

[ total of 123 entries: 1-50 | 27-76 | 77-123 ]
[ showing 50 entries per page: fewer | more | all ]

Fri, 26 May 2023 (continued, showing last 7 of 20 entries)

[27]  arXiv:2305.16076 (cross-list from eess.AS) [pdf, other]
Title: Transfer Learning for Personality Perception via Speech Emotion Recognition
Comments: Accepted to INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[28]  arXiv:2305.16065 (cross-list from eess.AS) [pdf, other]
Title: ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition
Comments: Accepted to INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[29]  arXiv:2305.16049 (cross-list from cs.CV) [pdf, other]
Title: CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition
Comments: to be published in INTERSPEECH 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[30]  arXiv:2305.15816 (cross-list from eess.AS) [pdf, other]
Title: DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
Comments: 23 pages, 10 figures, 17 tables, under review
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[31]  arXiv:2305.15760 (cross-list from cs.CL) [pdf, other]
Title: Svarah: Evaluating English ASR Systems on Indian Accents
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32]  arXiv:2305.15663 (cross-list from cs.CL) [pdf, other]
Title: Mixture-of-Expert Conformer for Streaming Multilingual ASR
Comments: Accepted to Interspeech 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2305.15518 (cross-list from eess.AS) [pdf, other]
Title: Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Comments: Accepted to INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Thu, 25 May 2023

[34]  arXiv:2305.15127 [pdf, other]
Title: PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms
Comments: to appear: INTERSPEECH 2023, associated model release: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[35]  arXiv:2305.15055 [pdf, other]
Title: Iteratively Improving Speech Recognition and Voice Conversion
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[36]  arXiv:2305.14867 [pdf, other]
Title: Interactive Neural Resonators
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[37]  arXiv:2305.14402 [pdf, other]
Title: Improving Speech Emotion Recognition Performance using Differentiable Architecture Search
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38]  arXiv:2305.15403 (cross-list from cs.CL) [pdf, other]
Title: AV-TranSpeech: Audio-Visual Robust Speech-to-Speech Translation
Comments: Accepted to ACL 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39]  arXiv:2305.15386 (cross-list from cs.CL) [pdf, other]
Title: Vistaar: Diverse Benchmarks and Training Sets for Indian Language ASR
Comments: Accepted in INTERSPEECH 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40]  arXiv:2305.15266 (cross-list from eess.AS) [pdf, other]
Title: Diffusion-Based Audio Inpainting
Comments: Submitted for publication to the Journal of Audio Engineering Society on January 30th, 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[41]  arXiv:2305.15255 (cross-list from cs.CL) [pdf, other]
Title: LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42]  arXiv:2305.14933 (cross-list from eess.AS) [pdf, other]
Title: Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation
Comments: To be published in InterSpeech 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43]  arXiv:2305.14875 (cross-list from cs.HC) [pdf, other]
Title: LoopBoxes -- Evaluation of a Collaborative Accessible Digital Musical Instrument
Comments: 10 pages, 9 figures, to be published in the Proceedings of the International Conference on New Interfaces for Musical Expression (NIME'23)
Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44]  arXiv:2305.14838 (cross-list from cs.CL) [pdf, other]
Title: ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45]  arXiv:2305.14778 (cross-list from eess.AS) [pdf, other]
Title: P-vectors: A Parallel-Coupled TDNN/Transformer Network for Speaker Verification
Comments: Accepted by INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[46]  arXiv:2305.14723 (cross-list from eess.AS) [pdf, other]
Title: Downstream Task Agnostic Speech Enhancement with Self-Supervised Representation Loss
Comments: 4 pages , 2 figures, Accepted to Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47]  arXiv:2305.14635 (cross-list from cs.CL) [pdf, other]
Title: CMOT: Cross-modal Mixup via Optimal Transport for Speech Translation
Comments: ACL 2023 main conference
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48]  arXiv:2305.14546 (cross-list from eess.AS) [pdf, other]
Title: On the Transferability of Whisper-based Representations for "In-the-Wild" Cross-Task Downstream Speech Applications
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[49]  arXiv:2305.14381 (cross-list from cs.LG) [pdf, other]
Title: Connecting Multi-modal Contrastive Representations
Comments: Demos are available at \url{this https URL}
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2305.14359 (cross-list from cs.MM) [pdf, other]
Title: Zero-shot personalized lip-to-speech synthesis with face image based voice control
Comments: ICASSP 2023
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Wed, 24 May 2023

[51]  arXiv:2305.14023 [pdf, other]
Title: Happy or Evil Laughter? Analysing a Database of Natural Audio Samples
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52]  arXiv:2305.13831 [pdf, other]
Title: ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Comments: Accepted by INTERSPEECH 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[53]  arXiv:2305.13796 [pdf, other]
Title: SE-Bridge: Speech Enhancement with Consistent Brownian Bridge
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[54]  arXiv:2305.13774 [pdf, other]
Title: ADD 2023: the Second Audio Deepfake Detection Challenge
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55]  arXiv:2305.13758 [pdf, other]
Title: A study of audio mixing methods for piano transcription in violin-piano ensembles
Comments: To Appear IEEE ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56]  arXiv:2305.13724 [pdf, ps, other]
Title: ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings
Comments: 5 pages, accepted for INTERSPEECH 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[57]  arXiv:2305.13716 [pdf, other]
Title: BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Comments: Accepted by INTERSPEECH 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[58]  arXiv:2305.13713 [pdf, ps, other]
Title: CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center
Comments: 5 pages, accepted for INTERSPEECH2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[59]  arXiv:2305.13701 [pdf, other]
Title: TO-Rawnet: Improving RawNet with TCN and Orthogonal Regularization for Fake Audio Detection
Comments: Interspeech2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60]  arXiv:2305.13700 [pdf, other]
Title: Detection of Cross-Dataset Fake Audio Based on Prosodic and Pronunciation Features
Comments: Interspeech2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61]  arXiv:2305.13612 [pdf, other]
Title: FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Comments: Accepted by ACL 2023 (Findings)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[62]  arXiv:2305.14097 (cross-list from cs.CR) [pdf, other]
Title: QFA2SR: Query-Free Adversarial Transfer Attacks to Speaker Recognition Systems
Comments: Accepted by the 32nd USENIX Security Symposium (2023 USENIX Security); Full Version
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63]  arXiv:2305.14079 (cross-list from eess.AS) [pdf, other]
Title: Masked Modeling Duo for Speech: Specializing General-Purpose Audio Representation to Speech using Denoising Distillation
Comments: Interspeech 2023; 5 pages, 2 figures, 6 tables, Code: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64]  arXiv:2305.14071 (cross-list from cs.CL) [pdf, other]
Title: Disentangled Variational Autoencoder for Emotion Recognition in Conversations
Comments: Accepted by IEEE Transactions on Affective Computing
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65]  arXiv:2305.14049 (cross-list from cs.CL) [pdf, other]
Title: Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding
Comments: Accepted by Interspeech 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66]  arXiv:2305.14042 (cross-list from cs.CL) [pdf, other]
Title: Improving speech translation by fusing speech and text
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67]  arXiv:2305.14035 (cross-list from cs.LG) [pdf, other]
Title: Can Self-Supervised Neural Networks Pre-Trained on Human Speech distinguish Animal Callers?
Comments: Accepted at Interspeech 2023
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68]  arXiv:2305.14032 (cross-list from eess.AS) [pdf, other]
Title: Patch-Mix Contrastive Learning with Audio Spectrogram Transformer on Respiratory Sound Classification
Comments: INTERSPEECH 2023, Code URL: this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[69]  arXiv:2305.13905 (cross-list from eess.AS) [pdf, other]
Title: EfficientSpeech: An On-Device Text to Speech Model
Authors: Rowel Atienza
Comments: To be presented at ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[70]  arXiv:2305.13580 (cross-list from eess.AS) [pdf, other]
Title: Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Comments: Accepted at Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[71]  arXiv:2305.13516 (cross-list from cs.CL) [pdf, other]
Title: Scaling Speech Technology to 1,000+ Languages
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72]  arXiv:2305.13512 (cross-list from cs.CL) [pdf, other]
Title: Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding
Comments: 6 pages, 2 figures; Accepted by Interspeech 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73]  arXiv:2305.13408 (cross-list from eess.AS) [pdf, other]
Title: Modular Domain Adaptation for Conformer-Based Streaming ASR
Comments: Accepted to Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[74]  arXiv:2305.13332 (cross-list from eess.AS) [pdf, other]
Title: Conditional Online Learning for Keyword Spotting
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[75]  arXiv:2305.13330 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised ASR via Cross-Lingual Pseudo-Labeling
Comments: under review
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)

Tue, 23 May 2023 (showing first 1 of 48 entries)

[76]  arXiv:2305.13262 [pdf, other]
Title: Modulation Extraction for LFO-driven Audio Effects
Comments: Accepted to DAFx 2023. Listening samples and plugins can be found at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 123 entries: 1-50 | 27-76 | 77-123 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2305, contact, help  (Access key information)