We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Jun 2022

[ total of 221 entries: 1-221 ]
[ showing 221 entries per page: fewer | more ]
[1]  arXiv:2206.00208 [pdf, other]
Title: AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Comments: Accepted by ISCSLP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2206.00393 [pdf, other]
Title: Towards Generalisable Audio Representations for Audio-Visual Navigation
Comments: CVPR 2022 Embodied AI Workshop
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[3]  arXiv:2206.00454 [pdf, other]
Title: Towards Context-Aware Neural Performance-Score Synchronisation
Authors: Ruchit Agrawal
Comments: PhD Thesis, Queen Mary University of London (190 pages)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[4]  arXiv:2206.00635 [pdf, other]
Title: Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor Decomposition
Journal-ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[5]  arXiv:2206.00901 [pdf]
Title: Musical Instrument Recognition by XGBoost Combining Feature Fusion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:2206.01071 [pdf, other]
Title: Partitura: A Python Package for Symbolic Music Processing
Journal-ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[7]  arXiv:2206.01104 [pdf, other]
Title: The match file format: Encoding Alignments between Scores and Performances
Journal-ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[8]  arXiv:2206.01305 [pdf, other]
Title: The Musical Arrow of Time -- The Role of Temporal Asymmetry in Music and Its Organicist Implications
Authors: Qi Xu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:2206.01542 [pdf, other]
Title: Detecting the Severity of Major Depressive Disorder from Speech: A Novel HARD-Training Methodology
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[10]  arXiv:2206.02211 [pdf, other]
Title: Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Comments: Submitted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[11]  arXiv:2206.02246 [pdf, other]
Title: Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
Comments: Accepted to Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[12]  arXiv:2206.02284 [pdf, other]
Title: Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Comments: MICCAI 2022 (early accept, Oral Presentation ~3%)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13]  arXiv:2206.02671 [pdf, ps, other]
Title: Canonical Cortical Graph Neural Networks and its Application for Speech Enhancement in Future Audio-Visual Hearing Aids
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[14]  arXiv:2206.03065 [pdf, other]
Title: Universal Speech Enhancement with Score-based Diffusion
Comments: 24 pages, 6 figures; includes appendix; examples in this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15]  arXiv:2206.03351 [pdf, other]
Title: AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16]  arXiv:2206.03393 [pdf, other]
Title: Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:2206.04006 [pdf, other]
Title: Few-Shot Audio-Visual Learning of Environment Acoustics
Comments: Accepted to NeurIPS 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18]  arXiv:2206.04658 [pdf, other]
Title: BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Comments: Listen to audio samples from BigVGAN at: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19]  arXiv:2206.04769 [pdf, other]
Title: CLAP: Learning Audio Concepts From Natural Language Supervision
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2206.04780 [pdf, other]
Title: Speak Like a Dog: Human to Non-human creature Voice Conversion
Comments: 5 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21]  arXiv:2206.04805 [pdf, other]
Title: Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022
Comments: Submitted to CEUR-WS under LifeCLEF for the BirdCLEF 2022 challenge as a working note
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22]  arXiv:2206.04962 [pdf, other]
Title: Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation
Comments: arXiv admin note: text overlap with arXiv:2112.11142
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2206.04984 [pdf, other]
Title: Zero-Shot Audio Classification using Image Embeddings
Comments: Accepted to the European Signal Processing Conference (EUSIPCO) 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2206.05018 [pdf, ps, other]
Title: Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic Features
Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[25]  arXiv:2206.05286 [src]
Title: AHD ConvNet for Speech Emotion Classification
Comments: Wrong authors quoted
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[26]  arXiv:2206.05408 [pdf, other]
Title: Multi-instrument Music Synthesis with Spectrogram Diffusion
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27]  arXiv:2206.05876 [pdf, other]
Title: Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques
Comments: arXiv admin note: substantial text overlap with arXiv:2106.04492
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[28]  arXiv:2206.05929 [pdf, other]
Title: Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure
Comments: 5 pages, 3 figures, 3 tables, EUSIPCO 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29]  arXiv:2206.06057 [pdf, ps, other]
Title: Low-complexity deep learning frameworks for acoustic scene classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30]  arXiv:2206.06117 [pdf]
Title: Optimizing musical chord inversions using the cartesian coordinate system
Authors: Steve Mathew D A
Comments: 9 pages, 5 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31]  arXiv:2206.06126 [pdf, other]
Title: Robust Time Series Denoising with Learnable Wavelet Packet Transform
Comments: 15 pages, 13 figures, 8 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32]  arXiv:2206.06573 [pdf, ps, other]
Title: Speech intelligibility of simulated hearing loss sounds and its prediction using the Gammachirp Envelope Similarity Index (GESI)
Comments: This paper was submitted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2206.06604 [pdf, other]
Title: WHIS: Hearing impairment simulator based on the gammachirp auditory filterbank
Authors: Toshio Irino
Comments: This paper was submitted to Trends in Hearing on Jun 5, 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34]  arXiv:2206.06680 [pdf, other]
Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
Comments: Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35]  arXiv:2206.06908 [pdf, other]
Title: LPCSE: Neural Speech Enhancement through Linear Predictive Coding
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36]  arXiv:2206.07176 [pdf, other]
Title: Frequency-centroid features for word recognition of non-native English speakers
Comments: Published in IEEE Irish Signals & Systems Conference (ISSC), 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[37]  arXiv:2206.07229 [pdf, other]
Title: Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Comments: To appear in INTERSPEECH 2022. 5 pages, 4 figures. Substantial text overlap with arXiv:2110.03156
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38]  arXiv:2206.07288 [pdf, other]
Title: Streaming non-autoregressive model for any-to-many voice conversion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39]  arXiv:2206.07289 [pdf, other]
Title: Text-Aware End-to-end Mispronunciation Detection and Diagnosis
Comments: Rejected by Interspeech2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[40]  arXiv:2206.07293 [pdf, other]
Title: FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Comments: The paper has been accepted by ICASSP 2022. 5 pages, 2 figures, 5 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41]  arXiv:2206.07340 [pdf, other]
Title: On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Authors: Kai Li, Yi Luo
Comments: 5 pages, 1 figure
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42]  arXiv:2206.07347 [pdf, other]
Title: On the Use of Deep Mask Estimation Module for Neural Source Separation Systems
Authors: Kai Li, Xiaolin Hu, Yi Luo
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43]  arXiv:2206.07511 [pdf, other]
Title: Investigating Multi-Feature Selection and Ensembling for Audio Classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44]  arXiv:2206.07860 [pdf, other]
Title: EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning
Comments: Accepted By IEEE Signal Processing Letter
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45]  arXiv:2206.07956 [pdf, other]
Title: Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Comments: accepted by INTERSPEECH2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46]  arXiv:2206.08007 [pdf, ps, other]
Title: DCASE 2022: Comparative Analysis Of CNNs For Acoustic Scene Classification Under Low-Complexity Considerations
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47]  arXiv:2206.08039 [pdf, ps, other]
Title: Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[48]  arXiv:2206.08170 [pdf, other]
Title: Adversarial Privacy Protection on Speech Enhancement
Comments: 5 pages, 6 figures
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49]  arXiv:2206.08189 [pdf, other]
Title: Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2206.08233 [pdf, other]
Title: Event-related data conditioning for acoustic event classification
Comments: Accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[51]  arXiv:2206.08297 [pdf, other]
Title: GoodBye WaveNet -- A Language Model for Raw Audio with Context of 1/2 Million Samples
Authors: Prateek Verma
Comments: 12 pages, 1 figure. Technical Report at Stanford University. Ongoing Work
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[52]  arXiv:2206.08312 [pdf, other]
Title: SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning
Comments: Website: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[53]  arXiv:2206.08317 [pdf, other]
Title: Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition
Comments: 5 pages, 3 figures, accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[54]  arXiv:2206.09131 [pdf, other]
Title: Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion
Comments: Accepted by Odyssey 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[55]  arXiv:2206.09142 [pdf, other]
Title: Redundancy Reduction Twins Network: A Training framework for Multi-output Emotion Regression
Comments: 5 pages, accepted by ICML Exvo workshop
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[56]  arXiv:2206.09298 [pdf, ps, other]
Title: GMM based multi-stage Wiener filtering for low SNR speech enhancement
Comments: 5 pages, 3 figures, submitted to a conference
Subjects: Sound (cs.SD); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[57]  arXiv:2206.09920 [pdf, other]
Title: WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis
Authors: Yi Wang, Yi Si
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
[58]  arXiv:2206.10175 [pdf, other]
Title: A Multi-grained based Attention Network for Semi-supervised Sound Event Detection
Journal-ref: INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[59]  arXiv:2206.10256 [pdf, other]
Title: Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS
Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[60]  arXiv:2206.10349 [pdf, ps, other]
Title: Joint Analysis of Acoustic Scenes and Sound Events Based on Multitask Learning with Dynamic Weight Adaptation
Comments: Submitted to Acoustical Science and Technology
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61]  arXiv:2206.10421 [pdf, other]
Title: Rethinking Audio-visual Synchronization for Active Speaker Detection
Comments: Accepted by IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2022)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[62]  arXiv:2206.10695 [pdf, other]
Title: Exploring the Effectiveness of Self-supervised Learning and Classifier Chains in Emotion Recognition of Nonverbal Vocalizations
Comments: Accepted by the ICML Expressive Vocalizations Workshop and Competition 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63]  arXiv:2206.10805 [pdf, other]
Title: Jointist: Joint Learning for Multi-instrument Transcription and Its Applications
Comments: Submitted to ISMIR
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64]  arXiv:2206.11049 [pdf, other]
Title: Dynamic Restrained Uncertainty Weighting Loss for Multitask Learning of Vocal Expression
Comments: 5 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[65]  arXiv:2206.11066 [pdf, other]
Title: Radio2Speech: High Quality Speech Recovery from Radio Frequency Signals
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[66]  arXiv:2206.11260 [pdf, other]
Title: Few-shot Long-Tailed Bird Audio Recognition
Comments: LifeCLEF2022 (best paper award)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[67]  arXiv:2206.11567 [pdf]
Title: Restoring speech intelligibility for hearing aid users with deep learning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[68]  arXiv:2206.11632 [pdf, other]
Title: Formant Estimation and Tracking using Probabilistic Heat-Maps
Comments: interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69]  arXiv:2206.11643 [pdf, ps, other]
Title: Towards Green ASR: Lossless 4-bit Quantization of a Hybrid TDNN System on the 300-hr Switchboard Corpus
Comments: Interspeech 2022 Accepted. arXiv admin note: text overlap with arXiv:2111.14479
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[70]  arXiv:2206.11699 [pdf, ps, other]
Title: The SJTU X-LANCE Lab System for CNSRC 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71]  arXiv:2206.11968 [pdf, other]
Title: Comparing supervised and self-supervised embedding for ExVo Multi-Task learning track
Journal-ref: Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72]  arXiv:2206.12038 [pdf, other]
Title: BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping
Comments: Submitted to HEAR-PMLR 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[73]  arXiv:2206.12229 [pdf, other]
Title: Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech
Comments: Accepted to IEEE SLT 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[74]  arXiv:2206.12230 [pdf, other]
Title: Deformable CNN and Imbalance-Aware Feature Learning for Singing Technique Classification
Comments: Accepted to INTERSPEECH2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[75]  arXiv:2206.12320 [pdf, other]
Title: PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Comments: 8 pages, 4 figures, Text, Speech and Dialogue 2022 Conference
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[76]  arXiv:2206.12469 [pdf, other]
Title: Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion, Age, and Origin from Vocal Bursts
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[77]  arXiv:2206.12494 [pdf, other]
Title: Multitask vocal burst modeling with ResNets and pre-trained paralinguistic Conformers
Comments: To be published in the ICML Expressive Vocalizations Workshop & Competition 2022 (this https URL)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[78]  arXiv:2206.12513 [pdf, other]
Title: Domain Generalization with Relaxed Instance Frequency-wise Normalization for Multi-device Acoustic Scene Classification
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[79]  arXiv:2206.12559 [pdf, other]
Title: Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[80]  arXiv:2206.12563 [pdf, other]
Title: Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Comments: To be published at the ICML Expressive Vocalizations Workshop and Competition (ExVo Generate) held in conjunction with the 39th International Conference on Machine Learning
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[81]  arXiv:2206.12568 [pdf, other]
Title: Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction
Journal-ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[82]  arXiv:2206.12662 [pdf, other]
Title: Synthesizing Personalized Non-speech Vocalization from Discrete Speech Representations
Authors: Chin-Cheng Hsu
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[83]  arXiv:2206.12829 [pdf, other]
Title: On Comparison of Encoders for Attention based End to End Speech Recognition in Standalone and Rescoring Mode
Comments: Accepted at SPCOM 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[84]  arXiv:2206.13021 [pdf, other]
Title: Speak Like a Professional: Increasing Speech Intelligibility by Mimicking Professional Announcer Voice with Voice Conversion
Comments: Accepted at INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85]  arXiv:2206.13071 [pdf, other]
Title: Uncertainty Calibration for Deep Audio Classifiers
Comments: Accepted by InterSpeech 2022, the first two authors contributed equally
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[86]  arXiv:2206.13085 [pdf, other]
Title: Sound Model Factory: An Integrated System Architecture for Generative Audio Modelling
Journal-ref: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) (pp. 308-322). Springer, Cham. 2022
Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[87]  arXiv:2206.13101 [pdf, other]
Title: SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning
Comments: This paper is accepted by Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[88]  arXiv:2206.13110 [pdf, other]
Title: Sequence-level Speaker Change Detection with Difference-based Continuous Integrate-and-fire
Comments: Signal Processing Letters 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89]  arXiv:2206.13136 [pdf]
Title: A two-stage full-band speech enhancement model with effective spectral compression mapping
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90]  arXiv:2206.13476 [pdf, other]
Title: Impact of Acoustic Event Tagging on Scene Classification in a Multi-Task Learning Framework
Comments: Accepted at ISCA Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[91]  arXiv:2206.13611 [pdf, other]
Title: ClearBuds: Wireless Binaural Earbuds for Learning-Based Speech Enhancement
Comments: 12 pages, Published in Mobisys 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92]  arXiv:2206.13689 [pdf, other]
Title: Tiny-Sepformer: A Tiny Time-Domain Transformer Network for Speech Separation
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93]  arXiv:2206.13691 [pdf, other]
Title: Dummy Prototypical Networks for Few-Shot Open-Set Keyword Spotting
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[94]  arXiv:2206.13700 [pdf, other]
Title: Domain Agnostic Few-shot Learning for Speaker Verification
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95]  arXiv:2206.13708 [pdf, other]
Title: Personalized Keyword Spotting through Multi-task Learning
Comments: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[96]  arXiv:2206.13817 [pdf, other]
Title: Comparison of Speech Representations for the MOS Prediction System
Comments: 5 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[97]  arXiv:2206.13909 [pdf, other]
Title: QTI Submission to DCASE 2021: residual normalization for device-imbalanced acoustic scene classification with efficient design
Comments: tech report; won 1st place in DCASE2021 challenge. arXiv admin note: substantial text overlap with arXiv:2111.06531
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[98]  arXiv:2206.13979 [pdf, other]
Title: Attack Agnostic Dataset: Towards Generalization and Stabilization of Audio DeepFake Detection
Comments: Proceedings of INTERSPEECH 2022 (Updated version: corrected ASVspoof dataset description)
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[99]  arXiv:2206.14659 [pdf, other]
Title: Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[100]  arXiv:2206.14723 [pdf, other]
Title: DrumGAN VST: A Plugin for Drum Sound Analysis/Synthesis With Autoencoding Generative Adversarial Networks
Comments: 7 pages, 2 figures, 3 tables, ICML2022 Machine Learning for Audio Synthesis (MLAS) Workshop, for sound examples visit this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[101]  arXiv:2206.15027 [pdf, other]
Title: Interpretable Melody Generation from Lyrics with Discrete-Valued Adversarial Training
Comments: 3 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102]  arXiv:2206.15056 [pdf, other]
Title: FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition
Comments: Accepted for Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[103]  arXiv:2206.15067 [pdf, other]
Title: Language Model-Based Emotion Prediction Methods for Emotional Speech Synthesis Systems
Comments: Accepted by INTERSPEECH2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104]  arXiv:2206.15155 [pdf, other]
Title: An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105]  arXiv:2206.15219 [pdf, ps, other]
Title: libACA, pyACA, and ACA-Code: Audio Content Analysis in 3 Languages
Authors: Alexander Lerch
Comments: Preprint submitted to "Software Impacts"
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[106]  arXiv:2206.15276 [pdf, other]
Title: R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[107]  arXiv:2206.15291 [pdf, other]
Title: Sonification as a Reliable Alternative to Conventional Visual Surgical Navigation
Comments: 19 pages, 7 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108]  arXiv:2206.15423 [pdf, other]
Title: Implicit Neural Spatial Filtering for Multichannel Source Separation in the Waveform Domain
Comments: Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[109]  arXiv:2206.15426 [pdf]
Title: Volume-Independent Music Matching by Frequency Spectrum Comparison
Authors: Anthony Lee
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[110]  arXiv:2206.01495 (cross-list from cs.LG) [pdf, other]
Title: Constraining Gaussian processes for physics-informed acoustic emission mapping
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111]  arXiv:2206.02050 (cross-list from cs.CV) [pdf, other]
Title: Learning Speaker-specific Lip-to-Speech Generation
Comments: Accepted at ICPR 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112]  arXiv:2206.02187 (cross-list from cs.CV) [pdf, other]
Title: M2FNet: Multi-modal Fusion Network for Emotion Recognition in Conversation
Comments: Accepted for publication in the 5th Multimodal Learning and Applications (MULA) Workshop at CVPR 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113]  arXiv:2206.03112 (cross-list from cs.LG) [pdf]
Title: Singapore Soundscape Site Selection Survey (S5): Identification of Characteristic Soundscapes of Singapore via Weighted k-means Clustering
Comments: 23 pages, 8 figures. Submitted to Sustainability
Journal-ref: MDPI Sustainability. 2022; 14(12):7485
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114]  arXiv:2206.03173 (cross-list from cs.CL) [pdf, other]
Title: Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation
Comments: Accepted by IJCAI-ECAI 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115]  arXiv:2206.03318 (cross-list from cs.CL) [pdf, other]
Title: LegoNN: Building Modular Encoder-Decoder Models
Comments: 13 pages; Submitted to TASLP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116]  arXiv:2206.04523 (cross-list from cs.CL) [pdf, other]
Title: Face-Dubbing++: Lip-Synchronous, Voice Preserving Translation of Videos
Subjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[117]  arXiv:2206.04571 (cross-list from cs.CL) [pdf, other]
Title: Revisiting End-to-End Speech-to-Text Translation From Scratch
Comments: ICML
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118]  arXiv:2206.04922 (cross-list from cs.CL) [pdf, other]
Title: A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation
Comments: 5 pages,5 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119]  arXiv:2206.05053 (cross-list from cs.HC) [pdf, other]
Title: Coswara: A website application enabling COVID-19 screening by analysing respiratory sound samples and health symptoms
Journal-ref: Interspeech, 2022
Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[120]  arXiv:2206.07373 (cross-list from cs.CL) [pdf, other]
Title: NatiQ: An End-to-end Text-to-Speech System for Arabic
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121]  arXiv:2206.07458 (cross-list from cs.CV) [pdf, other]
Title: VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection
Comments: Accepted by ECCV 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122]  arXiv:2206.07627 (cross-list from cs.CL) [pdf, ps, other]
Title: Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech
Comments: to be published in Proceedings of INTERSPEECH 2022
Journal-ref: Interspeech 2022, 1831-1835
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123]  arXiv:2206.07684 (cross-list from cs.CV) [pdf, other]
Title: AVATAR: Unconstrained Audiovisual Speech Recognition
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124]  arXiv:2206.07882 (cross-list from cs.CL) [pdf, other]
Title: Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Comments: 5 pages, 2 figures, 1 table. Paper accepted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2206.08835 (cross-list from cs.CL) [pdf, other]
Title: What can Speech and Language Tell us About the Working Alliance in Psychotherapy
Comments: Accepted at Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126]  arXiv:2206.08864 (cross-list from cs.LG) [pdf, other]
Title: Avoid Overfitting User Specific Information in Federated Keyword Spotting
Comments: Accepted by Interspeech 2022
Subjects: Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127]  arXiv:2206.09790 (cross-list from cs.CL) [pdf, other]
Title: The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition
Comments: Proceedings of the 13th Conference on Language Resources and Evaluation (LREC 2022), pages 1945 to 1954 Marseille, 20 to 25 June 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128]  arXiv:2206.10125 (cross-list from cs.CL) [pdf, ps, other]
Title: Supervision-Guided Codebooks for Masked Prediction in Speech Pre-training
Comments: To appear in Proc. Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129]  arXiv:2206.10188 (cross-list from cs.LG) [pdf, other]
Title: Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition
Comments: To be published in Proc. Interspeech 2022, Incheon, South Korea
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130]  arXiv:2206.10249 (cross-list from cs.HC) [pdf, other]
Title: Incorporating Voice Instructions in Model-Based Reinforcement Learning for Self-Driving Cars
Comments: NeurIPS 2021 Workshop on Machine Learning for Autonomous Driving
Subjects: Human-Computer Interaction (cs.HC); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131]  arXiv:2206.10411 (cross-list from cs.CV) [pdf, other]
Title: Audio-video fusion strategies for active speaker detection in meetings
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132]  arXiv:2206.10861 (cross-list from cs.CV) [pdf, other]
Title: UniCon+: ICTCAS-UCAS Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2022
Comments: 5 pages, 3 figures; technical report for AVA Challenge (see this https URL) at the International Challenge on Activity Recognition (ActivityNet), CVPR 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133]  arXiv:2206.12340 (cross-list from cs.HC) [pdf]
Title: How to hide your voice: Noise-cancelling bird photography blind
Comments: 26 pages, 11 figures. Revised argument in sections 2 and 4, results unchanged, references added
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134]  arXiv:2206.12484 (cross-list from cs.LG) [pdf, other]
Title: An Intensity and Phase Stacked Analysis of Phase-OTDR System using Deep Transfer Learning and Recurrent Neural Networks
Comments: 15 pages, 9 figures. Title updated
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135]  arXiv:2206.12638 (cross-list from cs.CL) [pdf, other]
Title: Distilling a Pretrained Language Model to a Multilingual ASR Model
Comments: Accepted to Interspeech 2022. Official implementation provided in this https URL
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136]  arXiv:2206.12693 (cross-list from cs.CL) [pdf, other]
Title: TEVR: Improving Speech Recognition by Token Entropy Variance Reduction
Comments: 10 pages including 2 pages appendix, 1 figure, 6 tables
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137]  arXiv:2206.12759 (cross-list from cs.CL) [pdf, other]
Title: Low-resource Accent Classification in Geographically-proximate Settings: A Forensic and Sociophonetics Perspective
Comments: INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138]  arXiv:2206.12772 (cross-list from cs.CV) [pdf, other]
Title: Exploiting Transformation Invariance and Equivariance for Self-supervised Sound Localisation
Comments: Camera-ready Version for ACMMM 2022, Project page is this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[139]  arXiv:2206.12879 (cross-list from cs.CL) [pdf, ps, other]
Title: Data Augmentation for Dementia Detection in Spoken Language
Comments: Accepted to INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140]  arXiv:2206.12931 (cross-list from cs.CL) [pdf, ps, other]
Title: Annotated Speech Corpus for Low Resource Indian Languages: Awadhi, Bhojpuri, Braj and Magahi
Comments: Speech for Social Good Workshop, 2022, Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141]  arXiv:2206.13135 (cross-list from cs.CL) [pdf]
Title: TALCS: An Open-Source Mandarin-English Code-Switching Corpus and a Speech Recognition Baseline
Comments: accepted by INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142]  arXiv:2206.13390 (cross-list from cs.CV) [pdf, other]
Title: A Comprehensive Survey on Video Saliency Detection with Auditory Information: the Audio-visual Consistency Perceptual is the Key!
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143]  arXiv:2206.13415 (cross-list from cs.CL) [pdf, ps, other]
Title: Is the Language Familiarity Effect gradual? A computational modelling approach
Comments: 8 pages, 2 figures, accepted at CogSci 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144]  arXiv:2206.14009 (cross-list from cs.CV) [pdf, other]
Title: Show Me Your Face, And I'll Tell You How You Speak
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[145]  arXiv:2206.14053 (cross-list from cs.CL) [pdf]
Title: Bengali Common Voice Speech Dataset for Automatic Speech Recognition
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146]  arXiv:2206.14142 (cross-list from cs.MM) [pdf, other]
Title: Let the paintings play
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147]  arXiv:2206.14589 (cross-list from cs.CL) [pdf, other]
Title: Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
Subjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148]  arXiv:2206.14660 (cross-list from cs.CL) [pdf, other]
Title: The THUEE System Description for the IARPA OpenASR21 Challenge
Comments: accepted by INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149]  arXiv:2206.14716 (cross-list from cs.CL) [pdf, other]
Title: Improving Deliberation by Text-Only and Semi-Supervised Training
Comments: Accepted by Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150]  arXiv:2206.00888 (cross-list from eess.AS) [pdf, other]
Title: Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Comments: NeurIPS 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[151]  arXiv:2206.00951 (cross-list from eess.AS) [pdf, other]
Title: Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[152]  arXiv:2206.00970 (cross-list from eess.AS) [pdf, other]
Title: Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[153]  arXiv:2206.01205 (cross-list from eess.AS) [pdf, other]
Title: Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages
Comments: See dataset at this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[154]  arXiv:2206.01948 (cross-list from eess.AS) [pdf, other]
Title: STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[155]  arXiv:2206.02124 (cross-list from eess.AS) [pdf, other]
Title: Sampling Frequency Independent Dialogue Separation
Comments: accepted into EUSIPCO 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[156]  arXiv:2206.02125 (cross-list from eess.AS) [pdf, other]
Title: Geometrically-Motivated Primary-Ambient Decomposition With Center-Channel Extraction
Comments: accepted into EUSIPCO 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[157]  arXiv:2206.02147 (cross-list from eess.AS) [pdf, other]
Title: Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech
Comments: Accepted by NeurIPS 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[158]  arXiv:2206.02432 (cross-list from eess.AS) [pdf, other]
Title: Online Neural Diarization of Unlimited Numbers of Speakers
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[159]  arXiv:2206.02512 (cross-list from eess.AS) [pdf, other]
Title: UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder
Comments: Under Review
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[160]  arXiv:2206.02639 (cross-list from eess.AS) [pdf, other]
Title: Continuous-Time Analog Filters for Audio Edge Intelligence: Review and Analysis on Design Techniques
Comments: 16 pages, 17 figures
Subjects: Audio and Speech Processing (eess.AS); Hardware Architecture (cs.AR); Sound (cs.SD)
[161]  arXiv:2206.03104 (cross-list from stat.AP) [pdf, other]
Title: Crossing the Linguistic Causeway: A Binational Approach for Translating Soundscape Attributes to Bahasa Melayu
Comments: Under review for Applied Acoustics (Special Issue on Soundscape Attributes Translation: Current Projects and Challenges)
Subjects: Applications (stat.AP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162]  arXiv:2206.03400 (cross-list from eess.AS) [pdf, ps, other]
Title: The Influence of Dataset Partitioning on Dysfluency Detection Systems
Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[163]  arXiv:2206.04305 (cross-list from eess.AS) [pdf, other]
Title: Context-based out-of-vocabulary word recovery for ASR systems in Indian languages
Comments: 12 pages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[164]  arXiv:2206.04850 (cross-list from eess.AS) [pdf, other]
Title: Feature-informed Embedding Space Regularization For Audio Classification
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[165]  arXiv:2206.05462 (cross-list from eess.AS) [pdf, other]
Title: Svadhyaya system for the Second Diagnosing COVID-19 using Acoustics Challenge 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[166]  arXiv:2206.05606 (cross-list from eess.AS) [pdf, other]
Title: Signal-informed DNN-based DOA Estimation combining an External Microphone and GCC-PHAT Features
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[167]  arXiv:2206.06192 (cross-list from eess.AS) [pdf, ps, other]
Title: Toward Zero Oracle Word Error Rate on the Switchboard Benchmark
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[168]  arXiv:2206.06208 (cross-list from eess.AS) [pdf, ps, other]
Title: Automated Evaluation of Standardized Dementia Screening Tests
Comments: Submitted to Interspeech 2022. arXiv admin note: text overlap with arXiv:2206.05018
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[169]  arXiv:2206.07430 (cross-list from eess.AS) [pdf, ps, other]
Title: Residual Language Model for End-to-end Speech Recognition
Comments: Accepted for Interspeech2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[170]  arXiv:2206.07569 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Voice Conversion with Information Perturbation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[171]  arXiv:2206.07917 (cross-list from eess.AS) [pdf, other]
Title: To Dereverb Or Not to Dereverb? Perceptual Studies On Real-Time Dereverberation Targets
Comments: 5 pages
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[172]  arXiv:2206.07931 (cross-list from eess.AS) [pdf, other]
Title: DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[173]  arXiv:2206.08058 (cross-list from eess.AS) [pdf, other]
Title: Nonwords Pronunciation Classification in Language Development Tests for Preschool Children
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[174]  arXiv:2206.08174 (cross-list from eess.AS) [pdf, other]
Title: Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations
Comments: 5 pages, 2 figures, 3 tables Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[175]  arXiv:2206.08525 (cross-list from eess.AS) [pdf, other]
Title: Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios(V1)
Comments: 4pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[176]  arXiv:2206.09072 (cross-list from eess.AS) [pdf, other]
Title: Semi-supervised Time Domain Target Speaker Extraction with Attention
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[177]  arXiv:2206.09102 (cross-list from eess.AS) [pdf, other]
Title: Decoupled Federated Learning for ASR with Non-IID Data
Comments: Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Sound (cs.SD)
[178]  arXiv:2206.09396 (cross-list from eess.AS) [pdf, other]
Title: Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping
Comments: proceedings of INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[179]  arXiv:2206.09507 (cross-list from eess.AS) [pdf, other]
Title: Resource-Efficient Separation Transformer
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[180]  arXiv:2206.09523 (cross-list from eess.AS) [pdf, other]
Title: Towards Trustworthy Edge Intelligence: Insights from Voice-Activated Services
Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Sound (cs.SD)
[181]  arXiv:2206.09556 (cross-list from eess.AS) [pdf, other]
Title: An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[182]  arXiv:2206.09783 (cross-list from eess.AS) [pdf, other]
Title: Boosting Cross-Domain Speech Recognition with Self-Supervision
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[183]  arXiv:2206.11000 (cross-list from eess.AS) [pdf, other]
Title: A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement
Comments: Published @ Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[184]  arXiv:2206.11045 (cross-list from eess.AS) [pdf, other]
Title: COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[185]  arXiv:2206.11181 (cross-list from eess.AS) [pdf, other]
Title: On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[186]  arXiv:2206.11558 (cross-list from eess.AS) [pdf, other]
Title: Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis
Comments: Accepted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[187]  arXiv:2206.11640 (cross-list from eess.AS) [pdf, other]
Title: Speaker-Independent Microphone Identification in Noisy Conditions
Journal-ref: in European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022, pp. 1047-1051
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[188]  arXiv:2206.11703 (cross-list from eess.AS) [pdf, other]
Title: Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[189]  arXiv:2206.12040 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Comments: 5 pages, 3 figures, accepted for INTERSPEECH 2022. Audio samples: this https URL
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[190]  arXiv:2206.12045 (cross-list from eess.AS) [pdf, other]
Title: Confidence Score Based Conformer Speaker Adaptation for Speech Recognition
Comments: It's accepted to INTERSPEECH 2022. arXiv admin note: text overlap with arXiv:2206.11596
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[191]  arXiv:2206.12059 (cross-list from eess.AS) [pdf]
Title: Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes
Comments: Technical Report submitted for DCASE2022 Challenge Task3
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[192]  arXiv:2206.12283 (cross-list from eess.AS) [pdf, other]
Title: Open-source objective-oriented framework for head-related transfer function
Authors: Adam Szwajcowski
Comments: Not submitted anywhere in the current form
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[193]  arXiv:2206.12285 (cross-list from eess.AS) [pdf, other]
Title: Speech Quality Assessment through MOS using Non-Matching References
Comments: To Appear, Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[194]  arXiv:2206.12297 (cross-list from eess.AS) [pdf, other]
Title: SAQAM: Spatial Audio Quality Assessment Metric
Comments: To Appear, Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[195]  arXiv:2206.12489 (cross-list from eess.AS) [pdf, other]
Title: Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models
Comments: Submitted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[196]  arXiv:2206.12774 (cross-list from eess.AS) [pdf, other]
Title: Meta Auxiliary Learning for Low-resource Spoken Language Understanding
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[197]  arXiv:2206.12857 (cross-list from eess.AS) [pdf, other]
Title: Transport-Oriented Feature Aggregation for Speaker Embedding Learning
Comments: Accepted for presentation at INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[198]  arXiv:2206.13014 (cross-list from eess.AS) [pdf, other]
Title: Joint Optimization of Sampling Rate Offsets Based on Entire Signal Relationship Among Distributed Microphones
Comments: 5 pages, 2 figures,accepted by Interspeech2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[199]  arXiv:2206.13044 (cross-list from eess.AS) [pdf, other]
Title: Extended U-Net for Speaker Verification in Noisy Environments
Comments: 5 pages, 2 figures, 4 tables, accepted to 2022 Interspeech as a conference paper
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[200]  arXiv:2206.13066 (cross-list from eess.AS) [pdf, other]
Title: Detection of Doctored Speech: Towards an End-to-End Parametric Learn-able Filter Approach
Authors: Rohit Arora
Comments: arXiv admin note: text overlap with arXiv:1904.05441 by other authors
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[201]  arXiv:2206.13232 (cross-list from eess.AS) [pdf, other]
Title: Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection
Comments: 5 pages, 1 figure, accepted by INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[202]  arXiv:2206.13240 (cross-list from eess.AS) [pdf, other]
Title: A Simple Baseline for Domain Adaptation in End to End ASR Systems Using Synthetic Data
Comments: Accepted at ECNLP @ACL 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[203]  arXiv:2206.13272 (cross-list from eess.AS) [pdf, other]
Title: Wideband Audio Waveform Evaluation Networks: Efficient, Accurate Estimation of Speech Qualities
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[204]  arXiv:2206.13310 (cross-list from eess.AS) [pdf, other]
Title: Insights into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. Changes: added reference
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[205]  arXiv:2206.13365 (cross-list from eess.AS) [pdf, other]
Title: Interpretable Acoustic Representation Learning on Breathing and Speech Signals for COVID-19 Detection
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[206]  arXiv:2206.13404 (cross-list from eess.AS) [pdf, other]
Title: Avocodo: Generative Adversarial Network for Artifact-free Vocoder
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[207]  arXiv:2206.13411 (cross-list from eess.AS) [pdf, other]
Title: Audio Similarity is Unreliable as a Proxy for Audio Quality
Comments: To Appear, Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[208]  arXiv:2206.13443 (cross-list from eess.AS) [pdf, other]
Title: CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
Comments: Accepted to be published in the Proceedings of InterSpeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[209]  arXiv:2206.13762 (cross-list from eess.AS) [pdf, other]
Title: A Hierarchical Speaker Representation Framework for One-shot Singing Voice Conversion
Comments: Accepted to INTERSPEECH 2022; Made some motifications in Fig.1 so that the system architecture will be more clear
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[210]  arXiv:2206.13768 (cross-list from eess.AS) [pdf, ps, other]
Title: Algorithms for audio inpainting based on probabilistic nonnegative matrix factorization
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[211]  arXiv:2206.13807 (cross-list from eess.AS) [pdf, other]
Title: Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector
Comments: 5 pages, 4 figures, 5 tables, accepted to 2022 Interspeech as a conference paper
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[212]  arXiv:2206.13808 (cross-list from eess.AS) [pdf, other]
Title: Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Comments: To be presented at EUSIPCO 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[213]  arXiv:2206.13865 (cross-list from eess.AS) [pdf, other]
Title: RetrieverTTS: Modeling Decomposed Factors for Text-Based Speech Insertion
Comments: 5 pages, 1 figure, 3 tables. Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[214]  arXiv:2206.14165 (cross-list from eess.AS) [pdf, other]
Title: Expressive, Variable, and Controllable Duration Modelling in TTS
Comments: Accepted to be published in the Proceedings of InterSpeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[215]  arXiv:2206.14357 (cross-list from eess.AS) [pdf, other]
Title: Comparing Conventional Pitch Detection Algorithms with a Neural Network Approach
Authors: Anja Kroon (McGill University)
Comments: 6 pages, 11 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[216]  arXiv:2206.14524 (cross-list from eess.AS) [pdf]
Title: A light-weight full-band speech enhancement model
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[217]  arXiv:2206.14639 (cross-list from eess.AS) [pdf, other]
Title: DDKtor: Automatic Diadochokinetic Speech Analysis
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[218]  arXiv:2206.14962 (cross-list from eess.AS) [pdf, other]
Title: GLD-Net: Improving Monaural Speech Enhancement by Learning Global and Local Dependency Features with GLD Block
Comments: Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[219]  arXiv:2206.14964 (cross-list from eess.AS) [pdf, other]
Title: Improving Visual Speech Enhancement Network by Learning Audio-visual Affinity with Multi-head Attention
Comments: Accepted by Interspeech 2022. arXiv admin note: substantial text overlap with arXiv:2101.06268
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[220]  arXiv:2206.14984 (cross-list from eess.AS) [pdf, other]
Title: TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Comments: Accepted to the conference of INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[221]  arXiv:2206.15356 (cross-list from eess.AS) [pdf, other]
Title: Acoustic Room Compensation Using Local PCA-based Room Average Power Response Estimation
Comments: 5 pages, 7 figures, to appear in IWAENC 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 221 entries: 1-221 ]
[ showing 221 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2212, contact, help  (Access key information)