We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 119 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-119 ]
[ showing 25 entries per page: fewer | more | all ]

Mon, 18 Oct 2021

[1]  arXiv:2110.08213 [pdf, other]
Title: Towards Identity Preserving Normal to Dysarthric Voice Conversion
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[2]  arXiv:2110.08090 [pdf, other]
Title: Using DeepProbLog to perform Complex Event Processing on an Audio Stream
Comments: 8 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[3]  arXiv:2110.07607 [pdf, other]
Title: HumBugDB: A Large-scale Acoustic Mosquito Dataset
Comments: Accepted at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. 10 pages main, 39 pages including appendix. This paper accompanies the dataset found at this https URL with corresponding code at this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[4]  arXiv:2110.08250 (cross-list from cs.CL) [pdf, other]
Title: Direct simultaneous speech to speech translation
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[5]  arXiv:2110.08243 (cross-list from eess.AS) [pdf, other]
Title: Neural Dubber: Dubbing for Silent Videos According to Scripts
Comments: Accepted by NeurIPS 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[6]  arXiv:2110.08214 (cross-list from cs.CL) [pdf, other]
Title: Incremental Speech Synthesis For Speech-To-Speech Translation
Comments: Work-in-progress
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2110.07982 (cross-list from cs.CL) [pdf, other]
Title: Scribosermo: Fast Speech-to-Text models for German and other Languages
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8]  arXiv:2110.07957 (cross-list from eess.AS) [pdf, other]
Title: Don't speak too fast: The impact of data bias on self-supervised speech models
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[9]  arXiv:2110.07840 (cross-list from cs.CL) [pdf, other]
Title: ESPnet2-TTS: Extending the Edge of TTS Research
Comments: Submitted to ICASSP2022. Demo HP: this https URL
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2110.07749 (cross-list from cs.LG) [pdf, other]
Title: Attention-Free Keyword Spotting
Comments: Submitted to ICASSP-2022 (5 pages)
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Fri, 15 Oct 2021 (showing first 15 of 19 entries)

[11]  arXiv:2110.07393 [pdf, other]
Title: M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Comments: 5 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2110.07313 [pdf, other]
Title: Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Comments: 4 pages. Submitted to ICASSP in Oct 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13]  arXiv:2110.07311 [pdf, other]
Title: SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14]  arXiv:2110.07210 [pdf, other]
Title: Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[15]  arXiv:2110.07027 [pdf, other]
Title: Comparison of SVD and factorized TDNN approaches for speech to text
Comments: 4 pages, 1 figure, 3 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[16]  arXiv:2110.06999 [pdf, other]
Title: Study of positional encoding approaches for Audio Spectrogram Transformers
Comments: Submitted to ICASSP 2022. 5 pages, 3 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:2110.07592 (cross-list from cs.CL) [pdf, other]
Title: Speech Toxicity Analysis: A New Spoken Language Processing Task
Comments: 5 pages, submitted to ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18]  arXiv:2110.07537 (cross-list from eess.AS) [pdf, ps, other]
Title: Toward Degradation-Robust Voice Conversion
Comments: Submitted to ICASSP 2022, equal contribution from first two authors
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[19]  arXiv:2110.07468 (cross-list from eess.AS) [pdf, other]
Title: SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Comments: vocoder, generative adversarial network, singing voice synthesis
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[20]  arXiv:2110.07419 (cross-list from eess.AS) [pdf, other]
Title: Student-t Networks for Melody Estimation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[21]  arXiv:2110.07410 (cross-list from cs.LG) [pdf, other]
Title: Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Comments: 5 pages, 4 figures. Accepted at Detection and Classification of Acoustic Scenes and Events 2021 (DCASE2021)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2110.07354 (cross-list from cs.LG) [pdf, other]
Title: Music Playlist Title Generation: A Machine-Translation Approach
Comments: Proceedings of the 2nd Workshop on NLP for Music and Spoken Audio, 22th International Society for Music Information Retrieval Conference (ISMIR)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2110.07274 (cross-list from cs.CL) [pdf, other]
Title: An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[24]  arXiv:2110.07216 (cross-list from eess.AS) [pdf, other]
Title: FedSpeech: Federated Text-to-Speech with Continual Learning
Comments: Accepted by IJCAI 2021
Journal-ref: 2021. Main Track. Pages 3829-3835
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25]  arXiv:2110.07205 (cross-list from eess.AS) [pdf, other]
Title: SpeechT5: Unified-Modal Encoder-Decoder Pre-training for Spoken Language Processing
Comments: work in process
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[ total of 119 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-119 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2110, contact, help  (Access key information)