We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Nov 2021, skipping first 100

[ total of 197 entries: 1-50 | 51-100 | 101-150 | 151-197 ]
[ showing 50 entries per page: fewer | more | all ]
[101]  arXiv:2111.03333 (cross-list from cs.CL) [pdf]
Title: Effective Cross-Utterance Language Modeling for Conversational Speech Recognition
Comments: 6 pages, 6 figures, and 4 tables. Accepted by 2022 International Joint Conference on Neural Networks (IJCNN 2022)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102]  arXiv:2111.03777 (cross-list from cs.CL) [pdf, other]
Title: Privacy attacks for automatic speech recognition acoustic models in a federated learning framework
Comments: Submitted to ICASSP 2022
Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 6972-6976
Subjects: Computation and Language (cs.CL); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103]  arXiv:2111.03945 (cross-list from cs.CL) [pdf, other]
Title: Towards Building ASR Systems for the Next Billion Users
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104]  arXiv:2111.04194 (cross-list from cs.CL) [pdf, other]
Title: Retrieving Speaker Information from Personalized Acoustic Models for Speech Recognition
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105]  arXiv:2111.04823 (cross-list from cs.CL) [pdf, other]
Title: Cascaded Multilingual Audio-Visual Learning from Videos
Comments: Presented at Interspeech 2021. This version contains updated results using the YouCook-Japanese dataset
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[106]  arXiv:2111.05011 (cross-list from cs.LG) [pdf, other]
Title: RAVE: A variational autoencoder for fast and high-quality neural audio synthesis
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107]  arXiv:2111.05113 (cross-list from cs.CR) [pdf, other]
Title: Membership Inference Attacks Against Self-supervised Speech Models
Comments: Accepted to Interspeech 2022. Code will be available in the future
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108]  arXiv:2111.05128 (cross-list from cs.LG) [pdf, other]
Title: Losses, Dissonances, and Distortions
Comments: In the 5th Machine Learning for Creativity and Design Workshop at NeurIPS 2021
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[109]  arXiv:2111.05222 (cross-list from cs.CV) [pdf, other]
Title: Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition
Comments: Accepted in FG2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[110]  arXiv:2111.05890 (cross-list from cs.CV) [pdf, ps, other]
Title: Multimodal End-to-End Group Emotion Recognition using Cross-Modal Attention
Authors: Lev Evtodienko
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111]  arXiv:2111.05948 (cross-list from cs.CL) [pdf, other]
Title: Scaling ASR Improves Zero and Few Shot Learning
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112]  arXiv:2111.06310 (cross-list from cs.CL) [pdf, other]
Title: Self-Normalized Importance Sampling for Neural Language Modeling
Comments: Accepted at INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113]  arXiv:2111.07402 (cross-list from cs.CL) [pdf, other]
Title: Textless Speech Emotion Conversion using Discrete and Decomposed Representations
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114]  arXiv:2111.07454 (cross-list from cs.CL) [pdf, other]
Title: Towards Interpretability of Speech Pause in Dementia Detection using Adversarial Learning
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115]  arXiv:2111.07549 (cross-list from cs.CL) [pdf, other]
Title: Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116]  arXiv:2111.08046 (cross-list from cs.CV) [pdf, other]
Title: Beyond Mono to Binaural: Generating Binaural Audio from Mono Audio with Depth and Cross Modal Attention
Comments: To appear in WACV 2022. arXiv admin note: text overlap with arXiv:2108.04906
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117]  arXiv:2111.08137 (cross-list from cs.CL) [pdf, other]
Title: Joint Unsupervised and Supervised Training for Multilingual ASR
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118]  arXiv:2111.08191 (cross-list from cs.CL) [pdf, other]
Title: CoCA-MDD: A Coupled Cross-Attention based Framework for Streaming Mispronunciation Detection and Diagnosis
Comments: 5 pages, 4 figures, Accepted by INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119]  arXiv:2111.08380 (cross-list from cs.MM) [pdf, other]
Title: Video Background Music Generation with Controllable Music Transformer
Comments: Accepted to ACM Multimedia 2021. Project website at this https URL
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120]  arXiv:2111.08400 (cross-list from cs.CL) [pdf, other]
Title: Integrated Semantic and Phonetic Post-correction for Chinese Speech Recognition
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121]  arXiv:2111.09296 (cross-list from cs.CL) [pdf, other]
Title: XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122]  arXiv:2111.09771 (cross-list from cs.MM) [pdf, other]
Title: Transformer-S2A: Robust and Efficient Speech-to-Animation
Comments: Accepted by ICASSP 2022
Subjects: Multimedia (cs.MM); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123]  arXiv:2111.10157 (cross-list from cs.CL) [pdf, other]
Title: Lattention: Lattice-attention in ASR rescoring
Comments: Submitted to ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124]  arXiv:2111.10367 (cross-list from cs.CL) [pdf, other]
Title: SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech
Comments: Updated preprint for SLUE Benchmark v0.2; Toolkit link this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2111.10882 (cross-list from cs.CV) [pdf, other]
Title: Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Comments: Published in BMVC 2021, project page: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126]  arXiv:2111.11703 (cross-list from cs.LG) [pdf, other]
Title: A Contextual Latent Space Model: Subsequence Modulation in Melodic Sequence
Authors: Taketo Akama
Comments: 22nd International Society for Music Information Retrieval Conference (ISMIR), 2021; 8 pages
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[127]  arXiv:2111.12028 (cross-list from cs.CL) [pdf]
Title: Romanian Speech Recognition Experiments from the ROBIN Project
Comments: 12 pages, 3 figures, ConsILR2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128]  arXiv:2111.12890 (cross-list from cs.CV) [pdf, other]
Title: V2C: Visual Voice Cloning
Comments: 15 pages, 14 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129]  arXiv:2111.13486 (cross-list from cs.CY) [pdf, other]
Title: When Creators Meet the Metaverse: A Survey on Computational Arts
Comments: Submitted to ACM Computing Surveys, 36 pages
Subjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130]  arXiv:2111.14706 (cross-list from cs.CL) [pdf, other]
Title: ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet
Comments: Accepted at ICASSP 2022 (5 pages)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131]  arXiv:2111.14951 (cross-list from cs.HC) [pdf, other]
Title: Expressive Communication: A Common Framework for Evaluating Developments in Generative Models and Steering Interfaces
Comments: 15 pages, 6 figures, submitted to ACM Intelligent User Interfaces 2022 Conference
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132]  arXiv:2111.15016 (cross-list from cs.CL) [pdf, other]
Title: Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133]  arXiv:2111.15156 (cross-list from cs.CL) [pdf, other]
Title: Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency
Comments: Accepted for publication in the International Journal of Artificial Intelligence in Education (IJAIED)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134]  arXiv:2111.00009 (cross-list from eess.AS) [pdf, other]
Title: Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model
Comments: submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[135]  arXiv:2111.00030 (cross-list from eess.AS) [pdf, other]
Title: Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers
Comments: Submitted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2021)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[136]  arXiv:2111.00127 (cross-list from eess.AS) [pdf, other]
Title: Cross-attention conformer for context modeling in speech enhancement for ASR
Comments: Will appear in IEEE-ASRU 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[137]  arXiv:2111.00242 (cross-list from eess.AS) [pdf]
Title: Self-Supervised Speech Denoising Using Only Noisy Audio Signals
Comments: 5 pages, 3 figures, 2 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[138]  arXiv:2111.00316 (cross-list from eess.AS) [pdf, other]
Title: Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[139]  arXiv:2111.00320 (cross-list from eess.AS) [pdf, other]
Title: Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[140]  arXiv:2111.00764 (cross-list from eess.AS) [pdf, other]
Title: SNRi Target Training for Joint Speech Enhancement and Recognition
Comments: Submitted to Interspeech 2022 (v1 has been rejected from ICASSP 2022)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[141]  arXiv:2111.01320 (cross-list from eess.AS) [pdf, other]
Title: AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[142]  arXiv:2111.01326 (cross-list from eess.AS) [pdf, other]
Title: Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[143]  arXiv:2111.01652 (cross-list from eess.AS) [pdf, other]
Title: Design and Evaluation of Active Noise Control on Machinery Noise
Journal-ref: APSIPA 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Systems and Control (eess.SY)
[144]  arXiv:2111.01690 (cross-list from eess.AS) [pdf, other]
Title: Recent Advances in End-to-End Automatic Speech Recognition
Authors: Jinyu Li
Comments: Accepted at APSIPA Transactions on Signal and Information Processing
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[145]  arXiv:2111.01710 (cross-list from eess.AS) [pdf, other]
Title: Multi-input Architecture and Disentangled Representation Learning for Multi-dimensional Modeling of Music Similarity
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[146]  arXiv:2111.01914 (cross-list from eess.AS) [pdf, other]
Title: Reduction of Subjective Listening Effort for TV Broadcast Signals with Recurrent Neural Networks
Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing. This version is the authors' version and may vary from the final publication in details
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[147]  arXiv:2111.02363 (cross-list from eess.AS) [pdf, other]
Title: Deep Learning-based Non-Intrusive Multi-Objective Speech Assessment Model with Cross-Domain Features
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[148]  arXiv:2111.02392 (cross-list from eess.AS) [pdf, other]
Title: A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion
Comments: 5 pages, 2 figures, 2 tables. Accepted at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[149]  arXiv:2111.02674 (cross-list from eess.AS) [pdf, other]
Title: Voice Conversion Can Improve ASR in Very Low-Resource Settings
Comments: 5 page, 4 tables, 2 figures. Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[150]  arXiv:2111.03482 (cross-list from eess.AS) [pdf, other]
Title: Target Speech Extraction: Independent Vector Extraction Guided by Supervised Speaker Identification
Comments: Modified version of the article accepted for publication in IEEE/ACM Transactions on Audio Speech and Language Processing journal. Original results unchanged, additional experiments presented, refined discussion and conclusions
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2295-2309, 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 197 entries: 1-50 | 51-100 | 101-150 | 151-197 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2208, contact, help  (Access key information)