We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 25 entries: 1-25 ]
[ showing 25 entries per page: fewer | more ]

Thu, 26 Nov 2020

[1]  arXiv:2011.12818 [pdf, other]
Title: Phase retrieval with Bregman divergences: Application to audio signal recovery
Comments: in Proceedings of iTWIST'20, Paper-ID: 16, Nantes, France, December, 2-4, 2020
Subjects: Sound (cs.SD)
[2]  arXiv:2011.12754 [pdf, other]
Title: Feature Selection based on Principal Component Analysis for Underwater Source Localization by Deep Learning
Subjects: Sound (cs.SD); Signal Processing (eess.SP); Atmospheric and Oceanic Physics (physics.ao-ph)
[3]  arXiv:2011.12596 [pdf, other]
Title: MTCRNN: A multi-scale RNN for directed audio texture synthesis
Authors: M. Huzaifah, L. Wyse
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[4]  arXiv:2011.12536 [pdf, ps, other]
Title: Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding
Authors: Achintya kr. Sarkar, Zheng-Hua Tan (Senior Member, IEEE)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[5]  arXiv:2011.12461 [pdf, other]
Title: SAR-Net: A End-to-End Deep Speech Accent Recognition Network
Comments: 10 pages, 7 figures, journal
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)

Wed, 25 Nov 2020

[6]  arXiv:2011.12022 [pdf, other]
Title: Multi-Decoder DPRNN: High Accuracy Source Counting and Separation
Comments: Project Page: this https URL Submitted to ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7]  arXiv:2011.11970 [pdf, other]
Title: A Novel Multimodal Music Genre Classifier using Hierarchical Attention and Convolutional Neural Network
Comments: 7 pages, 4 figures
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[8]  arXiv:2011.12063 (cross-list from eess.AS) [pdf, other]
Title: How Far Are We from Robust Voice Conversion: A Survey
Comments: Accepted by SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:2011.11818 (cross-list from eess.AS) [pdf, other]
Title: Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[10]  arXiv:2011.11715 (cross-list from cs.CL) [pdf, other]
Title: Multi-task Language Modeling for Improving Speech Recognition of Rare Words
Comments: Submitted to ICASSP 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[11]  arXiv:2011.11671 (cross-list from eess.AS) [pdf, other]
Title: Streaming Multi-speaker ASR with RNN-T
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)

Tue, 24 Nov 2020

[12]  arXiv:2011.11436 [pdf, other]
Title: Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer
Authors: Mohammad Soltanian (1), Junaid Malik (1), Jenni Raitoharju (2), Alexandros Iosifidis (3), Serkan Kiranyaz (4), Moncef Gabbouj (1) ((1) Department of Computing Sciences, Tampere University, Finland, (2) Programme for Environmental Information, Finnish Environment Institute, Jyvaskyla, Finland, (3) Department of Electrical and Computer Engineering, Aarhus University, Denmark, (4) Electrical Engineering Department, Qatar University, Qatar)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13]  arXiv:2011.10710 [pdf, other]
Title: Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification
Comments: Submitted to ICASSP2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14]  arXiv:2011.11588 (cross-list from cs.CL) [pdf, other]
Title: The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
Comments: 14 pages, including references and supplementary material
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[15]  arXiv:2011.11564 (cross-list from eess.AS) [pdf, other]
Title: Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16]  arXiv:2011.11315 (cross-list from eess.AS) [pdf, other]
Title: End-to-end Silent Speech Recognition with Acoustic Sensing
Comments: will be presented in SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2011.10798 (cross-list from eess.AS) [pdf, other]
Title: A Better and Faster End-to-End Model for Streaming ASR
Comments: Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[18]  arXiv:2011.10706 (cross-list from eess.AS) [pdf, other]
Title: Deep Network Perceptual Losses for Speech Denoising
Comments: First two authors contributed equally, 6 pages, 4 PDF figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

Mon, 23 Nov 2020

[19]  arXiv:2011.10233 [pdf, other]
Title: One Shot Learning for Speech Separation
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2011.10538 (cross-list from eess.AS) [pdf, other]
Title: Improving RNN-T ASR Accuracy Using Untranscribed Context Audio
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21]  arXiv:2011.10469 (cross-list from cs.LG) [pdf, other]
Title: Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 20 Nov 2020

[22]  arXiv:2011.09767 [pdf, other]
Title: Deep Residual Local Feature Learning for Speech Emotion Recognition
Comments: 12 pages, 5 figures, submitted for review
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[23]  arXiv:2011.09804 (cross-list from eess.AS) [pdf, other]
Title: TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos
Comments: 8 pages, 4 figures, Accepted to SLT2021, IEEE Spoken Language Technology Workshop
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV)
[24]  arXiv:2011.09744 (cross-list from cs.LG) [pdf, other]
Title: End-To-End Dilated Variational Autoencoder with Bottleneck Discriminative Loss for Sound Morphing -- A Preliminary Study
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25]  arXiv:2011.09631 (cross-list from eess.AS) [pdf, other]
Title: Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Comments: submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[ total of 25 entries: 1-25 ]
[ showing 25 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2011, contact, help  (Access key information)