We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Nov 2021, skipping first 75

[ total of 197 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | 176-197 ]
[ showing 25 entries per page: fewer | more | all ]
[76]  arXiv:2111.12331 [pdf, other]
Title: An MAP Estimation for Between-Class Variance
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[77]  arXiv:2111.12531 [pdf, ps, other]
Title: Non-Intrusive Binaural Speech Intelligibility Prediction from Discrete Latent Representations
Comments: 4 pages + 1 refs; 1 figure; accepted at IEEE SPL (to appear)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[78]  arXiv:2111.12588 [pdf, other]
Title: Towards Cross-Cultural Analysis using Music Information Dynamics
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[79]  arXiv:2111.12761 [pdf, other]
Title: Semi-Supervised Audio Classification with Partially Labeled Data
Comments: To be presented at IEEE ISM 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80]  arXiv:2111.12869 [pdf, other]
Title: Polyphonic Sound Event Detection Using Capsule Neural Network on Multi-Type-Multi-Scale Time-Frequency Representation
Comments: Under reviewed in ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81]  arXiv:2111.12986 [pdf, other]
Title: A-Muze-Net: Music Generation by Composing the Harmony based on the Generated Melody
Comments: Accepted for publication at MMM 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[82]  arXiv:2111.13457 [pdf, other]
Title: Semi-Supervised Music Tagging Transformer
Comments: International Society for Music Information Retrieval (ISMIR) 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83]  arXiv:2111.13694 [pdf, other]
Title: Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information
Comments: Submitted to ICASSP 2022, 5 pages, 2 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[84]  arXiv:2111.14203 [pdf]
Title: How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey
Comments: Abbreviated version of a longer survey under review
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[85]  arXiv:2111.14354 [pdf]
Title: Responding to Challenge Call of Machine Learning Model Development in Diagnosing Respiratory Disease Sounds
Authors: Negin Melek
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[86]  arXiv:2111.14479 [pdf, other]
Title: Mixed Precision DNN Qunatization for Overlapped Speech Separation and Recognition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87]  arXiv:2111.14843 [pdf, other]
Title: Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[88]  arXiv:2111.15159 [pdf, other]
Title: CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[89]  arXiv:2111.15222 [pdf, other]
Title: SP-SEDT: Self-supervised Pre-training for Sound Event Detection Transformer
Comments: Submitted to interspeech 2022; added experiments for section 4
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90]  arXiv:2111.00161 (cross-list from cs.CL) [pdf, other]
Title: Pseudo-Labeling for Massively Multilingual Speech Recognition
Comments: Accepted to ICASSP 2022. New version has links to code/models + more training curves for larger model. (Fixed code link.)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91]  arXiv:2111.00400 (cross-list from cs.CL) [pdf, other]
Title: FANS: Fusing ASR and NLU for on-device SLU
Comments: Published in Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92]  arXiv:2111.00610 (cross-list from cs.CL) [pdf, other]
Title: Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93]  arXiv:2111.00976 (cross-list from cs.CL) [pdf, other]
Title: A transfer learning based approach for pronunciation scoring
Comments: ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94]  arXiv:2111.01024 (cross-list from cs.CV) [pdf, other]
Title: With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition
Comments: Accepted at BMVC 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95]  arXiv:2111.01272 (cross-list from cs.CL) [pdf, other]
Title: Sequence Transduction with Graph-based Supervision
Comments: Accepted for publication at IEEE ICASSP 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96]  arXiv:2111.02216 (cross-list from cs.CL) [pdf, other]
Title: Automatic Embedding of Stories Into Collections of Independent Media
Comments: 2 pages in main text + 1 page of references + 6 pages of appendices, 2 figures in main text + 3 figures in appendices, 1 algorithm in appendices; source code available at this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97]  arXiv:2111.02735 (cross-list from cs.CL) [pdf, other]
Title: A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
Comments: 5 pages, 2 figures, submitted to INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98]  arXiv:2111.02813 (cross-list from cs.LG) [pdf, other]
Title: WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Comments: Accepted to NeurIPS 2021 (Benchmark and Dataset Track); Code: this https URL; Data: this https URL
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99]  arXiv:2111.03146 (cross-list from cs.LG) [pdf, other]
Title: Generating Diverse Realistic Laughter for Interactive Art
Comments: Presented at Machine Learning for Creativity and Design workshop at NeurIPS 2021
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100]  arXiv:2111.03250 (cross-list from cs.CL) [pdf, other]
Title: Context-Aware Transformer Transducer for Speech Recognition
Comments: Accepted to ASRU 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 197 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | 176-197 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2206, contact, help  (Access key information)