We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Tue, 11 May 2021

[1]  arXiv:2105.04310 [pdf, other]
Title: Study on the temporal pooling used in deep neural networks for speaker verification
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2]  arXiv:2105.03643 [pdf, other]
Title: Latency-Controlled Neural Architecture Search for Streaming Speech Recognition
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[3]  arXiv:2105.03583 [pdf]
Title: Domestic activities clustering from audio recordings using convolutional capsule autoencoder network
Comments: 5 pages, 2 figures, 5 tables, Accepted by IEEE ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[4]  arXiv:2105.03544 [pdf, other]
Title: Test-Time Adaptation Toward Personalized Speech Enhancement: Zero-Shot Learning with Knowledge Distillation
Authors: Sunwoo Kim, Minje Kim
Comments: 5 pages, 5 figures, under review
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[5]  arXiv:2105.03542 [pdf, other]
Title: Zero-Shot Personalized Speech Enhancement through Speaker-Informed Model Selection
Comments: 5 pages, 3 figures, submitted to 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[6]  arXiv:2105.04489 (cross-list from cs.CV) [pdf, other]
Title: Spoken Moments: Learning Joint Audio-Visual Representations from Video Descriptions
Comments: To appear at CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2105.04488 (cross-list from cs.SD) [pdf, other]
Title: A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment
Comments: To be published in ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8]  arXiv:2105.04458 (cross-list from cs.SD) [pdf, other]
Title: Learning Robust Latent Representations for Controllable Speech Synthesis
Comments: Accepted in ACL2021 Findings
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9]  arXiv:2105.04309 (cross-list from cs.SD) [pdf, other]
Title: Multi-modal Conditional Bounding Box Regression for Music Score Following
Comments: Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10]  arXiv:2105.04124 (cross-list from cs.SD) [pdf, other]
Title: MASS: Multi-task Anthropomorphic Speech Synthesis Framework
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:2105.04090 (cross-list from cs.SD) [pdf, other]
Title: MuseMorphose: Full-Song and Fine-Grained Music Style Transfer with Just One Transformer VAE
Comments: Preprint. 26 pages, 7 figures, and 8 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[12]  arXiv:2105.04079 (cross-list from cs.SD) [pdf, ps, other]
Title: Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method
Comments: 5 pages, 3 figures, accepted for European Signal Processing Conference 2021 (EUSIPCO 2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13]  arXiv:2105.04065 (cross-list from cs.SD) [pdf, other]
Title: Voice activity detection in the wild: A data-driven approach using teacher-student training
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1542-1555, 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:2105.03842 (cross-list from cs.CL) [pdf, other]
Title: FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2105.03809 (cross-list from eess.SP) [pdf, other]
Title: Superresolution photoacoustic tomography using random speckle illumination and second order moments
Comments: 9 pages, 5 figures
Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV); Medical Physics (physics.med-ph)
[16]  arXiv:2105.03716 (cross-list from cs.CL) [pdf, ps, other]
Title: Continuous representations of intents for dialogue systems
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 10 May 2021

[17]  arXiv:2105.03337 [pdf, other]
Title: Online Acoustic System Identification Exploiting Kalman Filtering and an Adaptive Impulse Response Subspace Model
Subjects: Audio and Speech Processing (eess.AS)
[18]  arXiv:2105.02911 [pdf, other]
Title: Weakly Supervised Source-Specific Sound Level Estimation in Noisy Soundscapes
Comments: 5 pages, 3 figures, WASPAA 2021 submission preprint
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19]  arXiv:2105.03409 (cross-list from cs.CL) [pdf, other]
Title: A Benchmarking on Cloud based Speech-To-Text Services for French Speech and Background Noise Effect
Comments: 6th National Conference on Practical Applications of Artificial Intelligence, 2021, Bordeaux, France
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2105.03070 (cross-list from cs.CL) [pdf, other]
Title: SpeechNet: A Universal Modularized Model for Speech Processing Tasks
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[21]  arXiv:2105.03036 (cross-list from cs.SD) [pdf, other]
Title: SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts
Comments: 5 pages, 2 figures. Submitted to Interspeech 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[22]  arXiv:2105.03010 (cross-list from cs.CL) [pdf, ps, other]
Title: Efficient Weight factorization for Multilingual Speech Recognition
Comments: Submitted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 7 May 2021 (showing first 3 of 11 entries)

[23]  arXiv:2105.02592 [pdf, other]
Title: USM-SED - A Dataset for Polyphonic Sound Event Detection in Urban Sound Monitoring Scenarios
Authors: Jakob Abeßer
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24]  arXiv:2105.02469 [pdf, other]
Title: Point Cloud Audio Processing
Comments: Submitted to WASPAA 2021, Code: this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[25]  arXiv:2105.02446 [pdf, other]
Title: DiffSinger: Diffusion Acoustic Model for Singing Voice Synthesis
Comments: Acoustic Model; Shallow Diffusion Mechanism
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[ total of 48 entries: 1-25 | 26-48 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2105, contact, help  (Access key information)