We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Jun 2022, skipping first 25

[ total of 221 entries: 1-50 | 26-75 | 76-125 | 126-175 | 176-221 ]
[ showing 50 entries per page: fewer | more | all ]
[26]  arXiv:2206.05408 [pdf, other]
Title: Multi-instrument Music Synthesis with Spectrogram Diffusion
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27]  arXiv:2206.05876 [pdf, other]
Title: Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques
Comments: arXiv admin note: substantial text overlap with arXiv:2106.04492
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[28]  arXiv:2206.05929 [pdf, other]
Title: Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure
Comments: 5 pages, 3 figures, 3 tables, EUSIPCO 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29]  arXiv:2206.06057 [pdf, ps, other]
Title: Low-complexity deep learning frameworks for acoustic scene classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30]  arXiv:2206.06117 [pdf]
Title: Optimizing musical chord inversions using the cartesian coordinate system
Authors: Steve Mathew D A
Comments: 9 pages, 5 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31]  arXiv:2206.06126 [pdf, other]
Title: Robust Time Series Denoising with Learnable Wavelet Packet Transform
Comments: 15 pages, 13 figures, 8 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32]  arXiv:2206.06573 [pdf, ps, other]
Title: Speech intelligibility of simulated hearing loss sounds and its prediction using the Gammachirp Envelope Similarity Index (GESI)
Comments: This paper was submitted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2206.06604 [pdf, other]
Title: WHIS: Hearing impairment simulator based on the gammachirp auditory filterbank
Authors: Toshio Irino
Comments: This paper was submitted to Trends in Hearing on Jun 5, 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34]  arXiv:2206.06680 [pdf, other]
Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
Comments: Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35]  arXiv:2206.06908 [pdf, other]
Title: LPCSE: Neural Speech Enhancement through Linear Predictive Coding
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36]  arXiv:2206.07176 [pdf, other]
Title: Frequency-centroid features for word recognition of non-native English speakers
Comments: Published in IEEE Irish Signals & Systems Conference (ISSC), 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[37]  arXiv:2206.07229 [pdf, other]
Title: Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Comments: To appear in INTERSPEECH 2022. 5 pages, 4 figures. Substantial text overlap with arXiv:2110.03156
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38]  arXiv:2206.07288 [pdf, other]
Title: Streaming non-autoregressive model for any-to-many voice conversion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39]  arXiv:2206.07289 [pdf, other]
Title: Text-Aware End-to-end Mispronunciation Detection and Diagnosis
Comments: Rejected by Interspeech2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[40]  arXiv:2206.07293 [pdf, other]
Title: FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Comments: The paper has been accepted by ICASSP 2022. 5 pages, 2 figures, 5 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41]  arXiv:2206.07340 [pdf, other]
Title: On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Authors: Kai Li, Yi Luo
Comments: 5 pages, 1 figure
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42]  arXiv:2206.07347 [pdf, other]
Title: On the Use of Deep Mask Estimation Module for Neural Source Separation Systems
Authors: Kai Li, Xiaolin Hu, Yi Luo
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43]  arXiv:2206.07511 [pdf, other]
Title: Investigating Multi-Feature Selection and Ensembling for Audio Classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44]  arXiv:2206.07860 [pdf, other]
Title: EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning
Comments: Accepted By IEEE Signal Processing Letter
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45]  arXiv:2206.07956 [pdf, other]
Title: Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Comments: accepted by INTERSPEECH2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46]  arXiv:2206.08007 [pdf, ps, other]
Title: DCASE 2022: Comparative Analysis Of CNNs For Acoustic Scene Classification Under Low-Complexity Considerations
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47]  arXiv:2206.08039 [pdf, ps, other]
Title: Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[48]  arXiv:2206.08170 [pdf, other]
Title: Adversarial Privacy Protection on Speech Enhancement
Comments: 5 pages, 6 figures
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49]  arXiv:2206.08189 [pdf, other]
Title: Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2206.08233 [pdf, other]
Title: Event-related data conditioning for acoustic event classification
Comments: Accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51]  arXiv:2206.08297 [pdf, other]
Title: GoodBye WaveNet -- A Language Model for Raw Audio with Context of 1/2 Million Samples
Authors: Prateek Verma
Comments: 12 pages, 1 figure. Technical Report at Stanford University. Ongoing Work
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[52]  arXiv:2206.08312 [pdf, other]
Title: SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Comments: Camera-ready version. Website: this https URL Project page: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[53]  arXiv:2206.08317 [pdf, other]
Title: Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Comments: 5 pages, 3 figures, accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[54]  arXiv:2206.09131 [pdf, other]
Title: Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion
Comments: Accepted by Odyssey 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[55]  arXiv:2206.09142 [pdf, other]
Title: Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Comments: 5 pages, accepted by ICML Exvo workshop
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56]  arXiv:2206.09298 [pdf, ps, other]
Title: GMM based multi-stage Wiener filtering for low SNR speech enhancement
Comments: 5 pages, 3 figures, submitted to a conference
Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[57]  arXiv:2206.09920 [pdf, other]
Title: WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
Authors: Yi Wang, Yi Si
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[58]  arXiv:2206.10175 [pdf, other]
Title: A Multi-grained based Attention Network for Semi-supervised Sound Event Detection
Journal-ref: INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[59]  arXiv:2206.10256 [pdf, other]
Title: Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[60]  arXiv:2206.10349 [pdf, ps, other]
Title: Joint Analysis of Acoustic Scenes and Sound Events Based on Multitask Learning with Dynamic Weight Adaptation
Comments: Submitted to Acoustical Science and Technology
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61]  arXiv:2206.10421 [pdf, other]
Title: Rethinking Audio-visual Synchronization for Active Speaker Detection
Comments: Accepted by IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2022)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[62]  arXiv:2206.10695 [pdf, other]
Title: Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Comments: Accepted by the ICML Expressive Vocalizations Workshop and Competition 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63]  arXiv:2206.10805 [pdf, other]
Title: Jointist: Joint Learning for Multi-instrument Transcription and Its Applications
Comments: Submitted to ISMIR
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64]  arXiv:2206.11049 [pdf, other]
Title: Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression
Comments: 5 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[65]  arXiv:2206.11066 [pdf, other]
Title: Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66]  arXiv:2206.11260 [pdf, other]
Title: Few-shot Long-Tailed Bird Audio Recognition
Comments: LifeCLEF2022 (best paper award)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[67]  arXiv:2206.11567 [pdf]
Title: Restoring speech intelligibility for hearing aid users with deep learning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[68]  arXiv:2206.11632 [pdf, other]
Title: Formant Estimation and Tracking using Probabilistic Heat-Maps
Comments: interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69]  arXiv:2206.11643 [pdf, ps, other]
Title: Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus
Comments: Interspeech 2022 Accepted. arXiv admin note: text overlap with arXiv:2111.14479
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[70]  arXiv:2206.11699 [pdf, ps, other]
Title: The SJTU X-LANCE Lab System for CNSRC 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71]  arXiv:2206.11968 [pdf, other]
Title: Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
Journal-ref: Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72]  arXiv:2206.12038 [pdf, other]
Title: BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Comments: Submitted to HEAR-PMLR 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[73]  arXiv:2206.12229 [pdf, other]
Title: Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech
Comments: Accepted to IEEE SLT 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[74]  arXiv:2206.12230 [pdf, other]
Title: Deformable CNN and Imbalance-Aware Feature Learning for Singing Technique Classification
Comments: Accepted to INTERSPEECH2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[75]  arXiv:2206.12320 [pdf, other]
Title: PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Comments: 8 pages, 4 figures, Text, Speech and Dialogue 2022 Conference
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[ total of 221 entries: 1-50 | 26-75 | 76-125 | 126-175 | 176-221 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2302, contact, help  (Access key information)