We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions, skipping first 59

[ total of 98 entries: 1-25 | 10-34 | 35-59 | 60-84 | 85-98 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 29 May 2023 (continued, showing last 2 of 13 entries)

[60]  arXiv:2305.16371 (cross-list from cs.CL) [pdf, other]
Title: INTapt: Information-Theoretic Adversarial Prompt Tuning for Enhanced Non-Native Speech Recognition
Comments: ACL2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61]  arXiv:2305.16342 (cross-list from cs.CL) [pdf, other]
Title: InterFormer: Interactive Local and Global Features Fusion for Automatic Speech Recognition
Comments: Accepted by Interspeech 2023
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 26 May 2023

[62]  arXiv:2305.16263 [pdf, other]
Title: Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Comments: Accepted to INTERSPEECH 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[63]  arXiv:2305.16070 [pdf, other]
Title: Visualizing data augmentation in deep speaker recognition
Comments: to be published in INTERSPEECH 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64]  arXiv:2305.16043 [pdf, other]
Title: Ordered and Binary Speaker Embedding
Comments: to be published in INTERSPEECH 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[65]  arXiv:2305.15905 [pdf, other]
Title: Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7
Comments: DCASE 2023 task 7 technical report
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[66]  arXiv:2305.15898 [pdf, other]
Title: Room Impulse Response Estimation in a Multiple Source Environment
Comments: 2023 AES International Conference on Spatial and Immersive Audio
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[67]  arXiv:2305.15859 [pdf, ps, other]
Title: Anomalous Sound Detection Based on Sound Separation
Comments: Accepted to INTERSPEECH2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68]  arXiv:2305.15758 [pdf]
Title: Towards Solving Cocktail-Party: The First Method to Build a Realistic Dataset with Ground Truths for Speech Separation
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69]  arXiv:2305.15719 [pdf, other]
Title: Efficient Neural Music Generation
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[70]  arXiv:2305.15601 [pdf, ps, other]
Title: Metamathematics of Algorithmic Composition
Authors: Michael Gogins
Comments: 15 pages, 0 figures. Comments are very welcome
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71]  arXiv:2305.15571 [pdf, other]
Title: Sound Design Strategies for Latent Audio Space Explorations Using Deep Learning Architectures
Comments: In Proceedings of Sound and Music Computing 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72]  arXiv:2305.16286 (cross-list from eess.AS) [pdf, other]
Title: Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition
Comments: Accepted by Interspeech; 5 pages, 1 figure, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[73]  arXiv:2305.16107 (cross-list from cs.CL) [pdf, other]
Title: VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation
Comments: Working in progress
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2305.16093 (cross-list from cs.CL) [pdf, other]
Title: End-to-End Simultaneous Speech Translation with Differentiable Segmentation
Comments: Accepted at ACL 2023 findings
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75]  arXiv:2305.16076 (cross-list from eess.AS) [pdf, other]
Title: Transfer Learning for Personality Perception via Speech Emotion Recognition
Comments: Accepted to INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[76]  arXiv:2305.16065 (cross-list from eess.AS) [pdf, other]
Title: ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition
Comments: Accepted to INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[77]  arXiv:2305.16049 (cross-list from cs.CV) [pdf, other]
Title: CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition
Comments: to be published in INTERSPEECH 2023
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78]  arXiv:2305.15816 (cross-list from eess.AS) [pdf, other]
Title: DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion
Comments: 23 pages, 10 figures, 17 tables, under review
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[79]  arXiv:2305.15760 (cross-list from cs.CL) [pdf, other]
Title: Svarah: Evaluating English ASR Systems on Indian Accents
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80]  arXiv:2305.15663 (cross-list from cs.CL) [pdf, other]
Title: Mixture-of-Expert Conformer for Streaming Multilingual ASR
Comments: Accepted to Interspeech 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81]  arXiv:2305.15518 (cross-list from eess.AS) [pdf, other]
Title: Spoofing Attacker Also Benefits from Self-Supervised Pretrained Model
Comments: Accepted to INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Thu, 25 May 2023 (showing first 3 of 17 entries)

[82]  arXiv:2305.15127 [pdf, other]
Title: PLCMOS -- a data-driven non-intrusive metric for the evaluation of packet loss concealment algorithms
Comments: to appear: INTERSPEECH 2023, associated model release: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83]  arXiv:2305.15055 [pdf, other]
Title: Iteratively Improving Speech Recognition and Voice Conversion
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[84]  arXiv:2305.14867 [pdf, other]
Title: Interactive Neural Resonators
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[ total of 98 entries: 1-25 | 10-34 | 35-59 | 60-84 | 85-98 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2305, contact, help  (Access key information)