We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Oct 2021

[ total of 324 entries: 1-322 | 323-324 ]
[ showing 322 entries per page: fewer | more | all ]
[1]  arXiv:2110.00046 [pdf, other]
Title: SpliceOut: A Simple and Efficient Audio Augmentation Method
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2]  arXiv:2110.00155 [pdf, other]
Title: Incremental Layer-wise Self-Supervised Learning for Efficient Speech Domain Adaptation On Device
Comments: 5 pages
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3]  arXiv:2110.00570 [pdf, other]
Title: Leveraging Low-Distortion Target Estimates for Improved Speech Enhancement
Comments: in submission
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4]  arXiv:2110.00794 [pdf, other]
Title: Processing Phoneme Specific Segments for Cleft Lip and Palate Speech Enhancement
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[5]  arXiv:2110.00940 [pdf, other]
Title: PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6]  arXiv:2110.01009 [pdf, other]
Title: Enriching Ontology with Temporal Commonsense for Low-Resource Audio Tagging
Comments: CIKM 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7]  arXiv:2110.01147 [pdf, other]
Title: On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[8]  arXiv:2110.01210 [pdf, other]
Title: Audio Captioning Using Sound Event Detection
Comments: Submitted to DCASE 2021 Challenge
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:2110.01367 [pdf, other]
Title: Audio-Visual Evaluation of Oratory Skills
Comments: TransAI 2021
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[10]  arXiv:2110.01425 [pdf, other]
Title: Building a Noisy Audio Dataset to Evaluate Machine Learning Approaches for Automatic Speech Recognition Systems
Comments: Tech report series Monografias em Ci\^encia da Computa\c{c}\~ao, september, 2021, Dep. Inform\'atica PUC-Rio, RJ, BRAZIL, ISSN 0103-9741
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[11]  arXiv:2110.02011 [pdf, other]
Title: Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[12]  arXiv:2110.02375 [pdf, other]
Title: Interpreting intermediate convolutional layers in unsupervised acoustic word classification
Comments: ICASSP 2022
Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[13]  arXiv:2110.02411 [pdf, other]
Title: Voice Aging with Audio-Visual Style Transfer
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14]  arXiv:2110.02584 [pdf, other]
Title: EdiTTS: Score-based Editing for Controllable Text-to-Speech
Comments: 4 pages, 3 figures, 3 tables, INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15]  arXiv:2110.02791 [pdf, other]
Title: Spell my name: keyword boosted speech recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[16]  arXiv:2110.02878 [pdf, other]
Title: An Investigation of the Effectiveness of Phase for Audio Classification
Comments: 5 pages, 3 figures
Journal-ref: ICASSP (2022) 3708-3712
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[17]  arXiv:2110.03156 [pdf, other]
Title: StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis
Comments: Submitted to ICASSP 2022. 5 pages, 3 figures, 1 table. Our codes are available at: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[18]  arXiv:2110.03174 [pdf, other]
Title: Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[19]  arXiv:2110.03183 [pdf, other]
Title: Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....
Authors: Prateek Verma
Comments: IEEE Copyright: written as told
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[20]  arXiv:2110.03243 [pdf, ps, other]
Title: Sound Event Detection Guided by Semantic Contexts of Scenes
Comments: Accepted to ICASSP 2022
Subjects: Sound (cs.SD)
[21]  arXiv:2110.03251 [pdf, other]
Title: A Cough-based deep learning framework for detecting COVID-19
Comments: COVID-19, EMBC-2022, DiCOVA, top 2nd, benchmark on Spec > 0.95%
Journal-ref: EMBC 44 (2022) 3422-3425
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22]  arXiv:2110.03272 [pdf, ps, other]
Title: A Novel Blind Source Separation Framework Towards Maximum Signal-To-Interference Ratio
Comments: 5 pages, 2 figures
Subjects: Sound (cs.SD)
[23]  arXiv:2110.03370 [pdf, other]
Title: WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[24]  arXiv:2110.03380 [pdf, other]
Title: Advancing the dimensionality reduction of speaker embeddings for speaker diarisation: disentangling noise and informing speech activity
Comments: This paper was submitted to ICASSP 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[25]  arXiv:2110.03390 [pdf, other]
Title: GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
Comments: 9 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[26]  arXiv:2110.03414 [pdf, other]
Title: SERAB: A multi-lingual benchmark for speech emotion recognition
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27]  arXiv:2110.03536 [pdf, other]
Title: Prototype Learning for Interpretable Respiratory Sound Analysis
Comments: Technical report of the paper accepted by IEEE ICASSP 2022
Subjects: Sound (cs.SD)
[28]  arXiv:2110.03744 [pdf, other]
Title: Voice Reenactment with F0 and timing constraints and adversarial learning of conversions
Comments: arXiv admin note: text overlap with arXiv:2107.12346
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29]  arXiv:2110.03771 [pdf, other]
Title: Wake-Cough: cough spotting and cougher identification for personalised long-term cough monitoring
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30]  arXiv:2110.04057 [pdf, other]
Title: FAST-RIR: Fast neural diffuse room impulse response generator
Comments: Accepted to ICASSP 2022. More results and source code is available at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[31]  arXiv:2110.04091 [pdf, other]
Title: Affective Burst Detection from Speech using Kernel-fusion Dilated Convolutional Neural Networks
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[32]  arXiv:2110.04284 [pdf, other]
Title: Auto-DSP: Learning to Optimize Acoustic Echo Cancellers
Comments: Accepted to the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). Source code and audio examples: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2110.04438 [pdf, other]
Title: Towards Lightweight Applications: Asymmetric Enroll-Verify Structure for Speaker Verification
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34]  arXiv:2110.04451 [pdf, other]
Title: Using multiple reference audios and style embedding constraints for speech synthesis
Comments: 5 pages,3 figures submitted to ICASSP2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[35]  arXiv:2110.04474 [pdf, other]
Title: A Mutual learning framework for Few-shot Sound Event Detection
Comments: Accepted by ICASSP2022. arXiv admin note: text overlap with arXiv:2106.12252 by other authors
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36]  arXiv:2110.04486 [pdf, other]
Title: PAMA-TTS: Progression-Aware Monotonic Attention for Stable Seq2Seq TTS With Accurate Phoneme Duration Control
Comments: Accepted by ICASSP 2022. 5 pages, 4 figures, 3 tables. Audio samples are available at: this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[37]  arXiv:2110.04621 [pdf, other]
Title: Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
Journal-ref: ICASSP 2022-2022 IEEE
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38]  arXiv:2110.04656 [pdf, other]
Title: Streaming on-device detection of device directed speech from voice and touch-based invocation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[39]  arXiv:2110.04678 [pdf, other]
Title: An Overview of Techniques for Biomarker Discovery in Voice Signal
Comments: Last two authors contributed equally to the paper
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[40]  arXiv:2110.04684 [pdf, other]
Title: Can Audio Captions Be Evaluated with Image Caption Metrics?
Comments: ICASSP 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[41]  arXiv:2110.04754 [pdf, other]
Title: Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[42]  arXiv:2110.04765 [pdf, other]
Title: Multi-task Learning with Metadata for Music Mood Classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[43]  arXiv:2110.04946 [pdf, other]
Title: LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44]  arXiv:2110.04972 [pdf, ps, other]
Title: Kernel Learning For Sound Field Estimation With L1 and L2 Regularizations
Comments: Accepted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45]  arXiv:2110.05020 [pdf, other]
Title: MELONS: generating melody with long-term structure using transformers and structure graph
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[46]  arXiv:2110.05033 [pdf, other]
Title: Pitch Preservation In Singing Voice Synthesis
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[47]  arXiv:2110.05042 [pdf, other]
Title: Multi-query multi-head attention pooling and Inter-topK penalty for speaker verification
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48]  arXiv:2110.05054 [pdf, other]
Title: Source Mixing and Separation Robust Audio Steganography
Comments: Accepted to ICASSP 2022
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[49]  arXiv:2110.05059 [pdf, other]
Title: Amicable examples for informed source separation
Comments: Accepted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2110.05069 [pdf, other]
Title: Efficient Training of Audio Transformers with Patchout
Comments: Submitted to Interspeech 2022. Source code: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[51]  arXiv:2110.05087 [pdf, ps, other]
Title: A Multi-Resolution Front-End for End-to-End Speech Anti-Spoofing
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[52]  arXiv:2110.05580 [pdf, other]
Title: vocadito: A dataset of solo vocals with $f_0$, note, and lyric annotations
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53]  arXiv:2110.05587 [pdf, other]
Title: Evaluation of Latent Space Disentanglement in the Presence of Interdependent Attributes
Comments: Submitted to the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Information Theory (cs.IT); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54]  arXiv:2110.05713 [pdf, other]
Title: Foster Strengths and Circumvent Weaknesses: a Speech Enhancement Framework with Two-branch Collaborative Learning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55]  arXiv:2110.05765 [pdf, other]
Title: Music Sentiment Transfer
Comments: NSF REU: Computational Methods for Understanding Music, Media, and Minds, University of Rochester
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[56]  arXiv:2110.05777 [pdf, other]
Title: Large-scale Self-Supervised Speech Representation Learning for Automatic Speaker Verification
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57]  arXiv:2110.05798 [pdf, other]
Title: Adapting TTS models For New Speakers using Transfer Learning
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[58]  arXiv:2110.05866 [pdf, ps, other]
Title: MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[59]  arXiv:2110.05966 [pdf, other]
Title: Multi-channel Narrow-band Deep Speech Separation with Full-band Permutation Invariant Training
Comments: accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[60]  arXiv:2110.05975 [pdf, other]
Title: Multi-Channel Far-Field Speaker Verification with Large-Scale Ad-hoc Microphone Arrays
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[61]  arXiv:2110.06100 [pdf, other]
Title: Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Comments: 5 pages, 1 figure, accepted by DCASE 2021 workshop
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[62]  arXiv:2110.06123 [pdf, other]
Title: COVID-19 Diagnosis from Cough Acoustics using ConvNets and Data Augmentation
Comments: DiCOVA, top 1st, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[63]  arXiv:2110.06280 [pdf, other]
Title: S3PRL-VC: Open-source Voice Conversion Framework with Self-supervised Speech Representations
Comments: Submitted to ICASSP 2022. Code available at: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64]  arXiv:2110.06323 [pdf, other]
Title: An Annihilating Filter-Based DOA Estimation for Uniform Linear Array
Authors: Son Phan, Lam Pham
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[65]  arXiv:2110.06371 [pdf, other]
Title: Algorithmic Composition by Autonomous Systems with Multiple Time-Scales
Authors: Risto Holopainen
Comments: 28 pages, 3 figures. Submitted to Divergence Press
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Adaptation and Self-Organizing Systems (nlin.AO)
[66]  arXiv:2110.06467 [pdf, other]
Title: Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[67]  arXiv:2110.06494 [pdf, other]
Title: Music Source Separation with Deep Equilibrium Models
Comments: 5 pages, 4 figures, accepted for publication in IEEE ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68]  arXiv:2110.06501 [pdf, other]
Title: Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Comments: 5 pages, 2 figures, accepted for publication in IEEE ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[69]  arXiv:2110.06525 [pdf, other]
Title: Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks
Comments: To be published at ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[70]  arXiv:2110.06534 [pdf, other]
Title: Simple Attention Module based Speaker Verification with Iterative noisy label detection
Comments: submitted to ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71]  arXiv:2110.06543 [pdf, ps, other]
Title: EIHW-MTG DiCOVA 2021 Challenge System Report
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72]  arXiv:2110.06565 [pdf, other]
Title: Duality Temporal-channel-frequency Attention Enhanced Speaker Representation Learning
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73]  arXiv:2110.06634 [pdf, other]
Title: End-to-end translation of human neural activity to speech with a dual-dual generative adversarial network
Comments: 12 pages, 13 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[74]  arXiv:2110.06707 [pdf, other]
Title: Singer separation for karaoke content generation
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[75]  arXiv:2110.06999 [pdf, other]
Title: Study of positional encoding approaches for Audio Spectrogram Transformers
Comments: Submitted to ICASSP 2022. 5 pages, 3 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[76]  arXiv:2110.07027 [pdf, other]
Title: Comparison of SVD and factorized TDNN approaches for speech to text
Comments: 4 pages, 1 figure, 3 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[77]  arXiv:2110.07210 [pdf, other]
Title: Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[78]  arXiv:2110.07311 [pdf, other]
Title: SpecSinGAN: Sound Effect Variation Synthesis Using Single-Image GANs
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[79]  arXiv:2110.07313 [pdf, other]
Title: Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Comments: 4 pages. Submitted to ICASSP in Oct 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[80]  arXiv:2110.07393 [pdf, other]
Title: M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81]  arXiv:2110.07607 [pdf, other]
Title: HumBugDB: A Large-scale Acoustic Mosquito Dataset
Comments: Accepted at the 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks. 10 pages main, 39 pages including appendix. This paper accompanies the dataset found at this https URL with corresponding code at this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[82]  arXiv:2110.08090 [pdf, other]
Title: Using DeepProbLog to perform Complex Event Processing on an Audio Stream
Comments: 8 pages, 3 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[83]  arXiv:2110.08213 [pdf, other]
Title: Towards Identity Preserving Normal to Dysarthric Voice Conversion
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[84]  arXiv:2110.08352 [pdf, other]
Title: Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[85]  arXiv:2110.08437 [pdf, other]
Title: NN3A: Neural Network supported Acoustic Echo Cancellation, Noise Suppression and Automatic Gain Control for Real-Time Communications
Comments: submitted to ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86]  arXiv:2110.08439 [pdf, other]
Title: Controllable Multichannel Speech Dereverberation based on Deep Neural Networks
Comments: submitted to ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87]  arXiv:2110.08634 [pdf, other]
Title: Towards Robust Waveform-Based Acoustic Models
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[88]  arXiv:2110.08731 [pdf, ps, other]
Title: Improving End-To-End Modeling for Mispronunciation Detection with Effective Augmentation Mechanisms
Comments: 7 pages, 2 figures, 4 tables, accepted to Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC 2021)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[89]  arXiv:2110.08821 [pdf, other]
Title: Storage and Authentication of Audio Footage for IoAuT Devices Using Distributed Ledger Technology
Comments: 11 pages, 3 Figures, 1 code listing
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[90]  arXiv:2110.08895 [pdf, other]
Title: DECAR: Deep Clustering for learning general-purpose Audio Representations
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[91]  arXiv:2110.09103 [pdf, other]
Title: LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech
Comments: Submitted to ICASSP 2022. Code available at: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[92]  arXiv:2110.09116 [pdf, ps, other]
Title: Real Additive Margin Softmax for Speaker Verification
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[93]  arXiv:2110.09121 [pdf, ps, other]
Title: KaraTuner: Towards end to end natural pitch correction for singing voice in karaoke
Comments: To be published in Proc. Interspeech 2022, Incheon, South Korea
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94]  arXiv:2110.09127 [pdf, other]
Title: SpecTNT: a Time-Frequency Transformer for Music Audio
Comments: 6 pages
Journal-ref: International Society for Music Information Retrieval (ISMIR) 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[95]  arXiv:2110.09223 [pdf, other]
Title: Learning Models for Query by Vocal Percussion: A Comparative Study
Comments: Published in proceedings of the International Computer Music Conference (ICMC) 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96]  arXiv:2110.09239 [pdf, ps, other]
Title: EIHW-MTG: Second DiCOVA Challenge System Report
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[97]  arXiv:2110.09441 [pdf, other]
Title: FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[98]  arXiv:2110.09598 [pdf, ps, other]
Title: Adversarial Domain Adaptation with Paired Examples for Acoustic Scene Classification on Different Recording Devices
Comments: Accepted for publication in the Proceedings of the 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021
Journal-ref: 2021 29th European Signal Processing Conference (EUSIPCO), Dublin, Ireland, 2021, pp. 1030-103
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99]  arXiv:2110.09600 [pdf, other]
Title: Who calls the shots? Rethinking Few-Shot Learning for Audio
Comments: WASPAA 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100]  arXiv:2110.09605 [pdf, other]
Title: Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[101]  arXiv:2110.09698 [pdf, other]
Title: Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge
Comments: 5 pages, 3 figures; accepted by Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[102]  arXiv:2110.09720 [pdf, other]
Title: Rep Works in Speaker Verification
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103]  arXiv:2110.09780 [pdf, other]
Title: Improving Emotional Speech Synthesis by Using SUS-Constrained VAE and Text Encoder Aggregation
Comments: accepted by ICASSP2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104]  arXiv:2110.09784 [pdf, other]
Title: SSAST: Self-Supervised Audio Spectrogram Transformer
Comments: Accepted at AAAI2022. Code at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[105]  arXiv:2110.09814 [pdf, other]
Title: Speech Pattern based Black-box Model Watermarking for Automatic Speech Recognition
Comments: 5 pages, 2 figures. Acceptted by 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[106]  arXiv:2110.10010 [pdf, other]
Title: Temporal separation of whale vocalizations from background oceanic noise using a power calculation
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107]  arXiv:2110.10103 [pdf, other]
Title: Continual self-training with bootstrapped remixing for speech enhancement
Comments: To appear in Proc. ICASSP 2022, May 22-27, 2022, Singapore
Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[108]  arXiv:2110.10402 [pdf, other]
Title: An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR
Comments: Accepted to APSIPA 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[109]  arXiv:2110.10491 [pdf, ps, other]
Title: A Study On Data Augmentation In Voice Anti-Spoofing
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[110]  arXiv:2110.10593 [pdf, ps, other]
Title: Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method
Comments: Submitted to Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[111]  arXiv:2110.10739 [pdf, other]
Title: Adapting Speech Separation to Real-World Meetings Using Mixture Invariant Training
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112]  arXiv:2110.10757 [pdf, other]
Title: TPARN: Triple-path Attentive Recurrent Network for Time-domain Multichannel Speech Enhancement
Comments: Accepted for publication in ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113]  arXiv:2110.10983 [pdf, other]
Title: Optimizing Multi-Taper Features for Deep Speaker Verification
Comments: To appear in IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[114]  arXiv:2110.11499 [pdf, other]
Title: Wav2CLIP: Learning Robust Audio Representations From CLIP
Comments: Copyright 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[115]  arXiv:2110.11807 [pdf, ps, other]
Title: Signal-Envelope: A C++ library with Python bindings for temporal envelope estimation
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116]  arXiv:2110.11844 [pdf, other]
Title: Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network
Comments: Accepted for publication in INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117]  arXiv:2110.12138 [pdf, other]
Title: Optimizing Alignment of Speech and Language Latent Spaces for End-to-End Speech Recognition and Understanding
Comments: submitted to ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118]  arXiv:2110.12539 [pdf, other]
Title: Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech
Comments: 5 pages, 5 figures, accepted at IberSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[119]  arXiv:2110.12561 [pdf, other]
Title: Lhotse: a speech data representation library for the modern deep learning ecosystem
Comments: Accepted for presentation at NeurIPS 2021 Data-Centric AI (DCAI) Workshop
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120]  arXiv:2110.12612 [pdf, other]
Title: DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[121]  arXiv:2110.12778 [pdf, other]
Title: A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments
Comments: arXiv admin note: text overlap with arXiv:2105.04488
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[122]  arXiv:2110.12855 [pdf, other]
Title: Actions Speak Louder than Listening: Evaluating Music Style Transfer based on Editing Experience
Comments: 9 pages, Proceedings of the 29th ACM International Conference on Multimedia
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[123]  arXiv:2110.13071 [pdf, other]
Title: Unsupervised Source Separation By Steering Pretrained Music Models
Comments: Submitted to ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[124]  arXiv:2110.13130 [pdf, other]
Title: Multichannel Speech Enhancement without Beamforming
Comments: Accepted for publication in ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2110.13323 [pdf, other]
Title: Deep Learning Tools for Audacity: Helping Researchers Expand the Artist's Toolkit
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[126]  arXiv:2110.13465 [pdf, other]
Title: CS-Rep: Making Speaker Verification Networks Embracing Re-parameterization
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127]  arXiv:2110.13589 [pdf, other]
Title: AQP: An Open Modular Python Platform for Objective Speech and Audio Quality Metrics
Comments: 6 pages, 3 figures, accepted and presented at ACM MMSys22, June, 2022, Athlone, Ireland
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128]  arXiv:2110.14131 [pdf, other]
Title: Temporal Knowledge Distillation for On-device Audio Classification
Comments: ICASSP 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[129]  arXiv:2110.14422 [pdf, ps, other]
Title: Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning
Comments: Published in: 2022 International Joint Conference on Neural Networks (IJCNN)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[130]  arXiv:2110.14425 [pdf, other]
Title: Generalizing AUC Optimization to Multiclass Classification for Audio Segmentation With Limited Training Data
Journal-ref: IEEE Signal Processing Letters, vol. 28, pp. 1135-1139, 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131]  arXiv:2110.14434 [pdf, ps, other]
Title: Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of Audio Signals
Comments: 4 pages, 2 figures, 1 table, 1 algorithm. To be published in GRETSI2022. The algorithm is available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Numerical Analysis (math.NA)
[132]  arXiv:2110.14437 [pdf, other]
Title: Exploring single-song autoencoding schemes for audio-based music structure analysis
Comments: 4 pages, 4 figures, 2 tables. Rejected from ICASSP 2022, an extended version is available at arXiv:2202.04981
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[133]  arXiv:2110.14513 [pdf, other]
Title: Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
Comments: Neural Information Processing Systems (NeurIPS) 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[134]  arXiv:2110.15316 [pdf, ps, other]
Title: VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge
Comments: 6 pages, in Chinese language, 3 tables, NCMMC 2021 conference paper
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135]  arXiv:2110.15430 [pdf, other]
Title: Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction
Comments: 5 pages, 1 figure, submitted to ICASSP 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[136]  arXiv:2110.15729 [pdf, ps, other]
Title: Decision Attentive Regularization to Improve Simultaneous Speech Translation Systems
Comments: 5 pages, 3 figures, 1 table
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[137]  arXiv:2110.15792 [pdf, other]
Title: VRAIN-UPV MLLP's system for the Blizzard Challenge 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138]  arXiv:2110.00508 (cross-list from cs.LG) [pdf, other]
Title: An Ensemble-based Multi-Criteria Decision Making Method for COVID-19 Cough Classification
Comments: 21 pages, 6 figures
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[139]  arXiv:2110.01001 (cross-list from cs.MM) [pdf, other]
Title: Multimodal Fusion Based Attentive Networks for Sequential Music Recommendation
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[140]  arXiv:2110.02404 (cross-list from cs.CV) [pdf, other]
Title: 3D-MOV: Audio-Visual LSTM Autoencoder for 3D Reconstruction of Multiple Objects from Video
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141]  arXiv:2110.02405 (cross-list from cs.CV) [pdf, other]
Title: Echo-Reconstruction: Audio-Augmented 3D Scene Reconstruction
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142]  arXiv:2110.02498 (cross-list from cs.CR) [pdf, other]
Title: Adversarial Attacks on Machinery Fault Diagnosis
Comments: 5 pages, 5 figures. Submitted to Interspeech 2022
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143]  arXiv:2110.02891 (cross-list from cs.LG) [pdf, other]
Title: Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Comments: ICML 2022
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144]  arXiv:2110.03047 (cross-list from cs.CL) [pdf, ps, other]
Title: Integrating Categorical Features in End-to-End ASR
Authors: Rongqing Huang
Comments: Submitted to ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145]  arXiv:2110.03281 (cross-list from cs.LG) [pdf, other]
Title: Detecting Autism Spectrum Disorders with Machine Learning Models Using Speech Transcripts
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146]  arXiv:2110.03326 (cross-list from cs.CL) [pdf, other]
Title: Back from the future: bidirectional CTC decoding using future information in speech recognition
Comments: submitted to ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147]  arXiv:2110.03427 (cross-list from cs.LG) [pdf, other]
Title: Is Attention always needed? A Case Study on Language Identification from Speech
Comments: Accepted for publication in Natural Language Engineering
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[148]  arXiv:2110.03560 (cross-list from cs.CL) [pdf, ps, other]
Title: Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149]  arXiv:2110.03609 (cross-list from cs.CL) [pdf, ps, other]
Title: Applying Phonological Features in Multilingual Text-To-Speech
Comments: demo webpage: this https URL
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150]  arXiv:2110.03756 (cross-list from cs.CL) [pdf, ps, other]
Title: Sonorant spectra and coarticulation distinguish speakers with different dialects
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[151]  arXiv:2110.03847 (cross-list from cs.CL) [pdf, other]
Title: Machine Translation Verbosity Control for Automatic Dubbing
Comments: Accepted at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152]  arXiv:2110.03876 (cross-list from cs.CL) [pdf, other]
Title: Phone-to-audio alignment without text: A Semi-supervised Approach
Comments: ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153]  arXiv:2110.03879 (cross-list from cs.CL) [pdf, other]
Title: Explaining the Attention Mechanism of End-to-End Speech Recognition Using Decision Trees
Comments: 10 pages, 5 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154]  arXiv:2110.04267 (cross-list from cs.LG) [pdf, other]
Title: Exploring Heterogeneous Characteristics of Layers in ASR Models for More Efficient Training
Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[155]  arXiv:2110.04590 (cross-list from cs.CL) [pdf, other]
Title: An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition
Comments: To appear in ASRU2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[156]  arXiv:2110.04891 (cross-list from cs.CL) [pdf, other]
Title: Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[157]  arXiv:2110.04923 (cross-list from cs.LG) [pdf, ps, other]
Title: Crack detection using tap-testing and machine learning techniques to prevent potential rockfall incidents
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[158]  arXiv:2110.04934 (cross-list from cs.CL) [pdf, other]
Title: Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition
Comments: Accepted at IEEE ICASSP 2022. 5 pages, 1 figure
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[159]  arXiv:2110.05313 (cross-list from cs.LG) [pdf, other]
Title: Unsupervised Source Separation via Bayesian Inference in the Latent Domain
Comments: 5 pages, 2 figures, submitted to Interspeech 2022
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[160]  arXiv:2110.05354 (cross-list from cs.CL) [pdf, ps, other]
Title: Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition
Comments: 5 pages, in Interspeech 2022
Journal-ref: Interspeech 2022, Incheon, Korea
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161]  arXiv:2110.05607 (cross-list from cs.LG) [pdf, other]
Title: Partial Variable Training for Efficient On-Device Federated Learning
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162]  arXiv:2110.05752 (cross-list from cs.CL) [pdf, other]
Title: UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training
Comments: ICASSP 2022 Submission
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[163]  arXiv:2110.05941 (cross-list from cs.LG) [pdf, ps, other]
Title: Rank-based loss for learning hierarchical representations
Comments: This version corrects a bug in the baseline results
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[164]  arXiv:2110.06263 (cross-list from cs.CL) [pdf, other]
Title: Speech Summarization using Restricted Self-Attention
Comments: Accepted at ICASSP 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165]  arXiv:2110.07187 (cross-list from cs.CL) [pdf, other]
Title: Revisiting IPA-based Cross-lingual Text-to-speech
Comments: Submitted to ICASSP2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166]  arXiv:2110.07274 (cross-list from cs.CL) [pdf, other]
Title: An Approach to Mispronunciation Detection and Diagnosis with Acoustic, Phonetic and Linguistic (APL) Embeddings
Comments: Accepted by ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[167]  arXiv:2110.07354 (cross-list from cs.LG) [pdf, other]
Title: Music Playlist Title Generation: A Machine-Translation Approach
Comments: Proceedings of the 2nd Workshop on NLP for Music and Spoken Audio, 22th International Society for Music Information Retrieval Conference (ISMIR)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168]  arXiv:2110.07410 (cross-list from cs.LG) [pdf, other]
Title: Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Comments: 5 pages, 4 figures. Accepted at Detection and Classification of Acoustic Scenes and Events 2021 (DCASE2021)
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169]  arXiv:2110.07592 (cross-list from cs.CL) [pdf, other]
Title: DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances
Comments: Submitted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170]  arXiv:2110.07749 (cross-list from cs.LG) [pdf, other]
Title: Attention-Free Keyword Spotting
Comments: 5 pages: Accepted at PML4DC workshop in ICLR 2022
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[171]  arXiv:2110.07840 (cross-list from cs.CL) [pdf, other]
Title: ESPnet2-TTS: Extending the Edge of TTS Research
Comments: Submitted to ICASSP2022. Demo HP: this https URL
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172]  arXiv:2110.07982 (cross-list from cs.CL) [pdf, other]
Title: Scribosermo: Fast Speech-to-Text models for German and other Languages
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[173]  arXiv:2110.08214 (cross-list from cs.CL) [pdf, other]
Title: From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation
Comments: Accepted by Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174]  arXiv:2110.08250 (cross-list from cs.CL) [pdf, other]
Title: Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[175]  arXiv:2110.08626 (cross-list from cs.LG) [pdf, other]
Title: Learning velocity model for complex media with deep convolutional neural networks
Comments: 14 pages, 6 figures, 6 tables
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176]  arXiv:2110.08791 (cross-list from cs.CV) [pdf, other]
Title: Taming Visually Guided Sound Generation
Comments: Accepted as an oral presentation for the BMVC 2021. Code: this https URL Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177]  arXiv:2110.09245 (cross-list from cs.CL) [pdf, other]
Title: Efficient Sequence Training of Attention Models using Approximative Recombination
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178]  arXiv:2110.09264 (cross-list from cs.CL) [pdf, other]
Title: Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[179]  arXiv:2110.09324 (cross-list from cs.CL) [pdf, other]
Title: Automatic Learning of Subword Dependent Model Scales
Comments: submitted to ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[180]  arXiv:2110.10429 (cross-list from cs.LG) [pdf, other]
Title: Knowledge distillation from language model to acoustic model: a hierarchical multi-task learning approach
Comments: 4page + 1page for citation + 2 pages for appendix
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[181]  arXiv:2110.12136 (cross-list from cs.CV) [pdf, other]
Title: A Study of Multimodal Person Verification Using Audio-Visual-Thermal Data
Comments: 7 pages, 4 figures, 4 tables
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[182]  arXiv:2110.12408 (cross-list from cs.ET) [pdf, other]
Title: Quantum Computer Music: Foundations and Initial Experiments
Comments: Pre-publication draft, to appear in book 'Quantum Computer Music', E. R. Miranda (Ed.). arXiv admin note: text overlap with arXiv:2006.13849
Subjects: Emerging Technologies (cs.ET); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantum Physics (quant-ph)
[183]  arXiv:2110.13023 (cross-list from cs.LG) [pdf, other]
Title: ML-Based Analysis to Identify Speech Features Relevant in Predicting Alzheimer's Disease
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184]  arXiv:2110.13250 (cross-list from cs.CR) [pdf, other]
Title: Beyond $L_p$ clipping: Equalization-based Psychoacoustic Attacks against ASRs
Comments: accepted at ACML 2021
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185]  arXiv:2110.13492 (cross-list from cs.LG) [pdf, ps, other]
Title: TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining
Comments: Published as a conference paper at ICASSP 2022, 5 pages, 4 figures, 3 tables
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186]  arXiv:2110.13877 (cross-list from cs.CL) [pdf, other]
Title: Assessing Evaluation Metrics for Speech-to-Speech Translation
Comments: ASRU 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[187]  arXiv:2110.13900 (cross-list from cs.CL) [pdf, other]
Title: WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Comments: Submitted to the Journal of Selected Topics in Signal Processing (JSTSP)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188]  arXiv:2110.14273 (cross-list from cs.CL) [pdf, other]
Title: Deep Learning For Prominence Detection In Children's Read Speech
Comments: Under review at ICASSP 2022. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[189]  arXiv:2110.14957 (cross-list from cs.AI) [pdf, other]
Title: End-to-End Speech Emotion Recognition: Challenges of Real-Life Emergency Call Centers Data Recordings
Authors: Théo Deschamps-Berger (LISN, CNRS), Lori Lamel (LISN, CNRS), Laurence Devillers (LISN, CNRS, SU)
Journal-ref: 2021 9th International Conference on Affective Computing and Intelligent Interaction (ACII), Sep 2021, Nara, Japan
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[190]  arXiv:2110.15222 (cross-list from cs.CL) [pdf, other]
Title: Word-level confidence estimation for RNN transducers
Journal-ref: Proc. ASRU 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191]  arXiv:2110.15704 (cross-list from cs.CL) [pdf, ps, other]
Title: Influence of ASR and Language Model on Alzheimer's Disease Detection
Comments: 5 pages. Submitted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2011.09272
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192]  arXiv:2110.15731 (cross-list from cs.CL) [pdf, other]
Title: CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese
Comments: This paper is under consideration at Language Resources and Evaluation (LREV)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193]  arXiv:2110.15790 (cross-list from cs.IR) [pdf, other]
Title: LSTM-RPA: A Simple but Effective Long Sequence Prediction Algorithm for Music Popularity Prediction
Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Sound (cs.SD); Social and Information Networks (cs.SI); Audio and Speech Processing (eess.AS)
[194]  arXiv:2110.15836 (cross-list from cs.CL) [pdf, ps, other]
Title: Combining Unsupervised and Text Augmented Semi-Supervised Learning for Low Resourced Autoregressive Speech Recognition
Comments: 5 pages, minor changes for camera ready version, to be published in IEEE ICASSP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195]  arXiv:2110.15909 (cross-list from cs.LG) [pdf, other]
Title: Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[196]  arXiv:2110.15941 (cross-list from cs.LG) [pdf, other]
Title: Personalized breath based biometric authentication with wearable multimodality
Comments: 7 pages (2 columns), 5 tables, 7 figures, submitted to ACM Multimedia 2020
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197]  arXiv:2110.00165 (cross-list from eess.AS) [pdf, other]
Title: Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning
Comments: ICASSP 2022 accepted, 5 pages, 2 figures, 5 tables
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[198]  arXiv:2110.00275 (cross-list from eess.AS) [pdf, other]
Title: SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection
Comments: (c) 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1749-1762, 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[199]  arXiv:2110.00745 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Complex-Valued Multidilated Convolutional Neural Network for Joint Acoustic Echo Cancellation and Noise Suppression
Comments: To be presented at the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
Journal-ref: Proceedings of the 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, pp. 656-660
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[200]  arXiv:2110.00797 (cross-list from eess.AS) [pdf, other]
Title: Significance of Data Augmentation for Improving Cleft Lip and Palate Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[201]  arXiv:2110.01077 (cross-list from eess.AS) [pdf, other]
Title: Multi-task Voice Activated Framework using Self-supervised Learning
Comments: Accepted at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[202]  arXiv:2110.01164 (cross-list from eess.AS) [pdf, other]
Title: Decoupling Speaker-Independent Emotions for Voice Conversion Via Source-Filter Networks
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[203]  arXiv:2110.01177 (cross-list from eess.AS) [pdf, other]
Title: The Second DiCOVA Challenge: Dataset and performance analysis for COVID-19 diagnosis using acoustics
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[204]  arXiv:2110.01422 (cross-list from eess.AS) [pdf, ps, other]
Title: Individualized sound pressure equalization in hearing devices exploiting an electro-acoustic model
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[205]  arXiv:2110.01436 (cross-list from eess.AS) [pdf, other]
Title: WaveBeat: End-to-end beat and downbeat tracking in the time domain
Comments: To appear at the 151st AES Convention
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[206]  arXiv:2110.01763 (cross-list from eess.AS) [pdf, other]
Title: DNSMOS P.835: A Non-Intrusive Perceptual Objective Speech Quality Metric to Evaluate Noise Suppressors
Comments: arXiv admin note: substantial text overlap with arXiv:2010.15258
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[207]  arXiv:2110.02077 (cross-list from eess.AS) [pdf, other]
Title: Deep Optimization of Parametric IIR Filters for Audio Equalization
Authors: Giovanni Pepe (1 and 2), Leonardo Gabrielli (1), Stefano Squartini (1), Carlo Tripodi (2), Nicolò Strozzi (2) ((1) Università Politecnica delle Marche, (2) ASK Industries S.p.A.)
Comments: submitted to IEEE/ACM TASLP on 12 May 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[208]  arXiv:2110.02144 (cross-list from eess.AS) [pdf, other]
Title: Late reverberation suppression using U-nets
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[209]  arXiv:2110.02151 (cross-list from eess.AS) [pdf, other]
Title: Detection of blue whale vocalisations using a temporal-domain convolutional neural network
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[210]  arXiv:2110.02189 (cross-list from eess.AS) [pdf, ps, other]
Title: Manifold learning-supported estimation of relative transfer functions for spatial filtering
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[211]  arXiv:2110.02285 (cross-list from eess.AS) [pdf, ps, other]
Title: Modelling of the Fender Bassman 5F6-A Tone Stack
Authors: Steven Fenton
Comments: 5 pages, 6 figues. General Reference Paper
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[212]  arXiv:2110.02345 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding
Comments: arXiv admin note: substantial text overlap with arXiv:2106.02170
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[213]  arXiv:2110.02360 (cross-list from eess.AS) [pdf, other]
Title: Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[214]  arXiv:2110.02592 (cross-list from eess.AS) [pdf, other]
Title: Improving Real-time Score Following in Opera by Combining Music with Lyrics Tracking
Comments: 5 pages, In Proceedings of the 2nd Workshop on NLP for Music and Audio (NLP4MusA), Online, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[215]  arXiv:2110.02695 (cross-list from eess.AS) [pdf, other]
Title: Lower Interaural Coherence in Off-Signal Bands Impairs Binaural Detection
Comments: 14 pages, 5 figures
Journal-ref: J. Acoust. Soc. Am. 151(6), 2022, 3927-3936
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[216]  arXiv:2110.03010 (cross-list from eess.AS) [pdf, other]
Title: AECMOS: A speech quality assessment metric for echo impairment
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[217]  arXiv:2110.03103 (cross-list from eess.AS) [pdf, other]
Title: Lightweight Speech Enhancement in Unseen Noisy and Reverberant Conditions using KISS-GEV Beamforming
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[218]  arXiv:2110.03114 (cross-list from eess.AS) [pdf, other]
Title: On audio enhancement via online non-negative matrix factorization
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[219]  arXiv:2110.03151 (cross-list from eess.AS) [pdf, other]
Title: Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Comments: To appear in ICASSP 2022; System labels (SC and VBx) in Table 1 have been fixed
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[220]  arXiv:2110.03299 (cross-list from eess.AS) [pdf, other]
Title: End-To-End Label Uncertainty Modeling for Speech-based Arousal Recognition Using Bayesian Neural Networks
Comments: ACCEPTED to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[221]  arXiv:2110.03329 (cross-list from eess.AS) [pdf, other]
Title: Towards Universal Neural Vocoding with a Multi-band Excited WaveNet
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[222]  arXiv:2110.03347 (cross-list from eess.AS) [pdf, ps, other]
Title: Cloning one's voice using very limited data in the wild
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[223]  arXiv:2110.03511 (cross-list from eess.AS) [pdf, other]
Title: Peer Collaborative Learning for Polyphonic Sound Event Detection
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[224]  arXiv:2110.03630 (cross-list from eess.AS) [pdf, other]
Title: Towards Faster Continuous Multi-Channel HRTF Measurements Based on Learning System Models
Comments: 5 pages, 4 figures, minor changes compared to v1 after reviewers' feedbacks, accepted at ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[225]  arXiv:2110.03691 (cross-list from eess.SP) [pdf, other]
Title: Direct design of biquad filter cascades with deep learning by sampling random polynomials
Comments: Accepted to ICASSP 2022
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[226]  arXiv:2110.03715 (cross-list from eess.AS) [pdf, other]
Title: PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[227]  arXiv:2110.03857 (cross-list from eess.AS) [pdf, other]
Title: A study on the efficacy of model pre-training in developing neural text-to-speech system
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[228]  arXiv:2110.03887 (cross-list from eess.AS) [pdf, other]
Title: Environment Aware Text-to-Speech Synthesis
Comments: Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[229]  arXiv:2110.03894 (cross-list from eess.AS) [pdf, other]
Title: Neural Model Reprogramming with Similarity Based Mapping for Low-Resource Spoken Command Recognition
Comments: Accepted to Interspeech 2023. Code is available at: this https URL Selected as Best Student Paper Candidate
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
[230]  arXiv:2110.03965 (cross-list from eess.AS) [pdf, other]
Title: Joint Scattering for Automatic Chick Call Recognition
Comments: 5 pages, submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[231]  arXiv:2110.04005 (cross-list from eess.AS) [pdf, other]
Title: KaraSinger: Score-Free Singing Voice Synthesis with VQ-VAE using Mel-spectrograms
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[232]  arXiv:2110.04047 (cross-list from eess.AS) [pdf, other]
Title: TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[233]  arXiv:2110.04056 (cross-list from eess.AS) [pdf, ps, other]
Title: Improving Pseudo-label Training For End-to-end Speech Recognition Using Gradient Mask
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[234]  arXiv:2110.04082 (cross-list from eess.AS) [pdf, other]
Title: A Method for Capturing and Reproducing Directional Reverberation in Six Degrees of Freedom
Comments: This work has been accepted for the I3DA 2021 International Conference and will be submitted to IEEE Xplore Digital Library for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[235]  arXiv:2110.04153 (cross-list from eess.AS) [pdf, other]
Title: Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Comments: Submitted to ICASSP 2022, 5 pages,2 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[236]  arXiv:2110.04187 (cross-list from eess.AS) [pdf, other]
Title: SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
Comments: INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[237]  arXiv:2110.04265 (cross-list from eess.AS) [pdf, other]
Title: A study of the robustness of raw waveform based speaker embeddings under mismatched conditions
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[238]  arXiv:2110.04289 (cross-list from eess.AS) [pdf, other]
Title: Location-based training for multi-channel talker-independent speaker separation
Comments: submitted to ICASSP 22
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[239]  arXiv:2110.04331 (cross-list from eess.AS) [pdf, ps, other]
Title: MusicNet: Compact Convolutional Neural Network for Real-time Background Music Detection
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[240]  arXiv:2110.04378 (cross-list from eess.AS) [pdf, other]
Title: Performance optimizations on deep noise suppression models
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[241]  arXiv:2110.04385 (cross-list from eess.AS) [pdf, other]
Title: Individualized Hear-through For Acoustic Transparency Using PCA-Based Sound Pressure Estimation At The Eardrum
Comments: 5 pages, 5 figures, accepted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[242]  arXiv:2110.04391 (cross-list from eess.AS) [pdf, other]
Title: Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[243]  arXiv:2110.04410 (cross-list from eess.AS) [pdf, other]
Title: TitaNet: Neural Model for speaker representation with 1D Depth-wise separable convolutions and global context
Comments: preprint. Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[244]  arXiv:2110.04440 (cross-list from eess.AS) [pdf, other]
Title: Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks
Comments: 5 pages. arXiv admin note: text overlap with arXiv:2102.07054
Journal-ref: Proceedings of the 2021 International Conference on Multimodal Interaction
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[245]  arXiv:2110.04482 (cross-list from eess.AS) [pdf, other]
Title: Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis
Comments: Accepted to ICASSP 2022. Camera-ready
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[246]  arXiv:2110.04484 (cross-list from eess.AS) [pdf, other]
Title: Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR
Comments: Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[247]  arXiv:2110.04511 (cross-list from eess.AS) [pdf, other]
Title: Data Augmentation with Locally-time Reversed Speech for Automatic Speech Recognition
Authors: Si-Ioi Ng, Tan Lee
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[248]  arXiv:2110.04584 (cross-list from eess.AS) [pdf, other]
Title: Visually Exploring Multi-Purpose Audio Data
Comments: Presented at MMSP 2021
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Sound (cs.SD)
[249]  arXiv:2110.04585 (cross-list from eess.AS) [pdf, other]
Title: An evaluation of data augmentation methods for sound scene geotagging
Comments: Presented at Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[250]  arXiv:2110.04612 (cross-list from eess.AS) [pdf, other]
Title: Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[251]  arXiv:2110.04654 (cross-list from eess.AS) [pdf, other]
Title: Complex Network-Based Approach for Feature Extraction and Classification of Musical Genres
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[252]  arXiv:2110.04692 (cross-list from eess.AS) [pdf, other]
Title: Poformer: A simple pooling transformer for speaker verification
Comments: submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[253]  arXiv:2110.04694 (cross-list from eess.AS) [pdf, other]
Title: Multi-Channel End-to-End Neural Diarization with Distributed Microphones
Comments: Accepted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[254]  arXiv:2110.04775 (cross-list from eess.AS) [pdf, other]
Title: Estimating the confidence of speech spoofing countermeasure
Comments: Work in progress. Comments are welcome. Accepted by ICASSP2022. Code is available this https URL Not all the comments from anonymous reviewers can be addressed within 4 pages, apologize for that
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[255]  arXiv:2110.04791 (cross-list from eess.AS) [pdf, other]
Title: Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain
Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[256]  arXiv:2110.04850 (cross-list from eess.AS) [pdf, other]
Title: Direct source and early reflections localization using deep deconvolution network under reverberant environment
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[257]  arXiv:2110.04908 (cross-list from eess.AS) [pdf, other]
Title: DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation
Comments: ACL 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[258]  arXiv:2110.04948 (cross-list from eess.AS) [pdf, other]
Title: Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy
Comments: Submitted to ICASSP2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[259]  arXiv:2110.05036 (cross-list from eess.AS) [pdf, other]
Title: Multi-View Self-Attention Based Transformer for Speaker Recognition
Comments: Paper to appear at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[260]  arXiv:2110.05249 (cross-list from eess.AS) [pdf, other]
Title: A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation
Comments: Accepted to ASRU2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[261]  arXiv:2110.05267 (cross-list from eess.AS) [pdf, other]
Title: Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition
Comments: 5 pages, 7 figures, Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[262]  arXiv:2110.05431 (cross-list from eess.AS) [pdf, other]
Title: On the invertibility of a voice privacy system using embedding alignement
Authors: Pierre Champion (MULTISPEECH, LIUM), Thomas Thebaud (LIUM), Gaël Le Lan, Anthony Larcher (LIUM), Denis Jouvet (MULTISPEECH)
Journal-ref: ASRU 2021 - IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2021, Cartagena, Colombia
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[263]  arXiv:2110.05632 (cross-list from stat.AP) [pdf, other]
Title: Wind-robust sound event detection and denoising for bioacoustics
Comments: 34 pages, 5 figures, 2 supplementary figures
Subjects: Applications (stat.AP); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[264]  arXiv:2110.05695 (cross-list from eess.AS) [pdf, ps, other]
Title: The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction
Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[265]  arXiv:2110.05745 (cross-list from eess.AS) [pdf, other]
Title: VarArray: Array-Geometry-Agnostic Continuous Speech Separation
Comments: 5 pages, 1 figure, 3 tables, submitted to ICASSP 2022; updated reference information of [33]
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[266]  arXiv:2110.05948 (cross-list from eess.SP) [pdf, other]
Title: Denoising Diffusion Gamma Models
Comments: arXiv admin note: substantial text overlap with arXiv:2106.07582
Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[267]  arXiv:2110.05994 (cross-list from eess.AS) [pdf, other]
Title: Word Order Does Not Matter For Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[268]  arXiv:2110.06126 (cross-list from eess.AS) [pdf, other]
Title: Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection
Comments: 5 pages, 2 figures, 4 tables. Submitted to the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[269]  arXiv:2110.06304 (cross-list from eess.AS) [pdf, other]
Title: Generalized Time Domain Velocity Vector
Comments: Submitted
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[270]  arXiv:2110.06306 (cross-list from eess.AS) [pdf, other]
Title: Fine-grained style control in Transformer-based Text-to-speech Synthesis
Comments: Accepted in ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[271]  arXiv:2110.06309 (cross-list from eess.AS) [pdf, other]
Title: Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
Comments: Accepted to ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[272]  arXiv:2110.06428 (cross-list from eess.AS) [pdf, other]
Title: All-neural beamformer for continuous speech separation
Comments: 5 pages, 3 figures, 2 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[273]  arXiv:2110.06434 (cross-list from eess.AS) [pdf, other]
Title: DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding
Comments: Accepted to ASRU 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[274]  arXiv:2110.06440 (cross-list from eess.AS) [pdf, other]
Title: SDR -- Medium Rare with Fast Computations
Authors: Robin Scheibler
Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[275]  arXiv:2110.06546 (cross-list from eess.AS) [pdf, other]
Title: A Melody-Unsupervision Model for Singing Voice Synthesis
Comments: ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[276]  arXiv:2110.06691 (cross-list from eess.AS) [pdf, other]
Title: Diverse Audio Captioning via Adversarial Training
Comments: 5 pages, 1 figure, accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[277]  arXiv:2110.07116 (cross-list from eess.AS) [pdf, other]
Title: Auxiliary Loss of Transformer with Residual Connection for End-to-End Speaker Diarization
Comments: Submitted to ICASSP 2022, equal contribution from first two authors
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[278]  arXiv:2110.07124 (cross-list from eess.AS) [pdf, other]
Title: Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
Comments: 5 pages, 3 figures, accepted for publication in IEEE ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[279]  arXiv:2110.07192 (cross-list from eess.AS) [pdf, other]
Title: Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech
Comments: Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[280]  arXiv:2110.07205 (cross-list from eess.AS) [pdf, other]
Title: SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Comments: Accepted by ACL 2022 main conference
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[281]  arXiv:2110.07216 (cross-list from eess.AS) [pdf, other]
Title: FedSpeech: Federated Text-to-Speech with Continual Learning
Comments: Accepted by IJCAI 2021
Journal-ref: 2021. Main Track. Pages 3829-3835
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[282]  arXiv:2110.07419 (cross-list from eess.AS) [pdf, other]
Title: Student-t Networks for Melody Estimation
Authors: Udhav Gupta, Avi, Bhavesh Jain
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[283]  arXiv:2110.07468 (cross-list from eess.AS) [pdf, other]
Title: SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
Comments: Accepted by ACM Multimedia 2022
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[284]  arXiv:2110.07537 (cross-list from eess.AS) [pdf, ps, other]
Title: Toward Degradation-Robust Voice Conversion
Comments: To appear in the proceedings of ICASSP 2022, equal contribution from first two authors
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[285]  arXiv:2110.07957 (cross-list from eess.AS) [pdf, other]
Title: Don't speak too fast: The impact of data bias on self-supervised speech models
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[286]  arXiv:2110.08243 (cross-list from eess.AS) [pdf, other]
Title: Neural Dubber: Dubbing for Videos According to Scripts
Comments: Accepted by NeurIPS 2021; Project page at this https URL
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[287]  arXiv:2110.08545 (cross-list from eess.AS) [pdf, other]
Title: A Unified Speaker Adaptation Approach for ASR
Comments: Accepted by EMNLP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[288]  arXiv:2110.08583 (cross-list from eess.AS) [pdf, ps, other]
Title: ASR4REAL: An extended benchmark for speech models
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[289]  arXiv:2110.08598 (cross-list from eess.AS) [pdf, other]
Title: A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer
Comments: Accepted to ICASSP 2022. Code is available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD)
[290]  arXiv:2110.08813 (cross-list from eess.AS) [pdf, other]
Title: VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis
Comments: 5 pages, ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[291]  arXiv:2110.08862 (cross-list from eess.AS) [pdf, other]
Title: Deep Learning Based EDM Subgenre Classification using Mel-Spectrogram and Tempogram Features
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[292]  arXiv:2110.09000 (cross-list from eess.AS) [pdf, other]
Title: Supervised Metric Learning for Music Structure Features
Comments: This paper was accepted and presented at ISMIR 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[293]  arXiv:2110.09019 (cross-list from eess.AS) [pdf, ps, other]
Title: Similarity-and-Independence-Aware Beamformer with Iterative Casting and Boost Start for Target Source Extraction Using Reference
Authors: Atsuo Hiroe
Comments: Accepted for publication as a regular paper in the IEEE Open Journal of Signal Processing (2021)
Journal-ref: A. Hiroe, "Similarity-and-Independence-Aware Beam-former with Iterative Casting and Boost Start for Target Source Extraction Using Reference," in IEEE Open Journal of Signal Processing, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[294]  arXiv:2110.09150 (cross-list from eess.AS) [pdf, other]
Title: Tackling the Score Shift in Cross-Lingual Speaker Verification by Exploiting Language Information
Comments: proceedings of ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[295]  arXiv:2110.09625 (cross-list from eess.AS) [pdf, other]
Title: Personalized Speech Enhancement: New Models and Comprehensive Evaluation
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[296]  arXiv:2110.09890 (cross-list from eess.AS) [pdf, other]
Title: Multi-Modal Pre-Training for Automated Speech Recognition
Comments: Presented at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[297]  arXiv:2110.09923 (cross-list from eess.AS) [pdf, ps, other]
Title: Speech Enhancement-assisted Voice Conversion in Noisy Environments
Journal-ref: APSIPA 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[298]  arXiv:2110.09924 (cross-list from eess.AS) [pdf, ps, other]
Title: Speech Enhancement Based on Cyclegan with Noise-informed Training
Journal-ref: ISCSLP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[299]  arXiv:2110.09928 (cross-list from eess.AS) [pdf, other]
Title: CycleFlow: Purify Information Factors by Cycle Loss
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[300]  arXiv:2110.09930 (cross-list from eess.AS) [pdf, other]
Title: Speech Representation Learning Through Self-supervised Pretraining And Multi-task Finetuning
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[301]  arXiv:2110.09958 (cross-list from eess.AS) [pdf, other]
Title: The Cocktail Fork Problem: Three-Stem Audio Separation for Real-World Soundtracks
Comments: Accepted to ICASSP2022. For resources and examples, see this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[302]  arXiv:2110.10026 (cross-list from eess.AS) [pdf, other]
Title: Private Language Model Adaptation for Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[303]  arXiv:2110.10139 (cross-list from eess.AS) [pdf, other]
Title: Chunked Autoregressive GAN for Conditional Waveform Synthesis
Comments: Published as a conference paper at ICLR 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[304]  arXiv:2110.10326 (cross-list from eess.AS) [pdf, other]
Title: Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion
Comments: Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[305]  arXiv:2110.10330 (cross-list from eess.AS) [pdf, other]
Title: One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[306]  arXiv:2110.10812 (cross-list from eess.AS) [pdf, other]
Title: REAL-M: Towards Speech Separation on Real Mixtures
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[307]  arXiv:2110.11144 (cross-list from eess.AS) [pdf, other]
Title: RCT: Random Consistency Training for Semi-supervised Sound Event Detection
Comments: Preprint for interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[308]  arXiv:2110.11438 (cross-list from eess.AS) [pdf, ps, other]
Title: Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence
Journal-ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[309]  arXiv:2110.11479 (cross-list from eess.AS) [pdf, other]
Title: Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[310]  arXiv:2110.12304 (cross-list from eess.AS) [pdf, other]
Title: A Study of Acoustic Features in Arabic Speaker Identification under Noisy Environmental Conditions
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[311]  arXiv:2110.12676 (cross-list from eess.AS) [pdf, other]
Title: Controllable and Interpretable Singing Voice Decomposition via Assem-VC
Comments: Accepted to NeurIPS Workshop on ML for Creativity and Design 2021 (Oral)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[312]  arXiv:2110.12820 (cross-list from eess.AS) [pdf, ps, other]
Title: On Synchronization of Wireless Acoustic Sensor Networks in the Presence of Time-varying Sampling Rate Offsets and Speaker Changes
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[313]  arXiv:2110.13125 (cross-list from eess.AS) [pdf, ps, other]
Title: Automatic Impact-sounding Acoustic Inspection of Concrete Structure
Journal-ref: 10th International Conference on Structural Health Monitoring of Intelligent Infrastructure, SHMII 10, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[314]  arXiv:2110.13586 (cross-list from eess.AS) [pdf, other]
Title: Towards Audio Domain Adaptation for Acoustic Scene Classification using Disentanglement Learning
Comments: submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[315]  arXiv:2110.13653 (cross-list from eess.AS) [pdf, other]
Title: Learning Speaker Representation with Semi-supervised Learning approach for Speaker Profiling
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[316]  arXiv:2110.14139 (cross-list from eess.AS) [pdf, other]
Title: Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions
Comments: 5 pages, 3 figures, accepted by IEEE WASPAA 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[317]  arXiv:2110.14142 (cross-list from eess.AS) [pdf, other]
Title: Separating Long-Form Speech with Group-Wise Permutation Invariant Training
Comments: 5 pages, 3 figures, 3 tables, submitted to IEEE ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[318]  arXiv:2110.14838 (cross-list from eess.AS) [pdf, other]
Title: Continuous Speech Separation with Recurrent Selective Attention Network
Comments: Submitted to ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[319]  arXiv:2110.15018 (cross-list from eess.AS) [pdf, other]
Title: TorchAudio: Building Blocks for Audio and Speech Processing
Comments: Accepted by ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[320]  arXiv:2110.15581 (cross-list from eess.AS) [pdf, other]
Title: SA-SDR: A novel loss function for separation of meeting style data
Comments: accepted at ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[321]  arXiv:2110.15593 (cross-list from q-bio.QM) [pdf, ps, other]
Title: Towards automatic detection and classification of orca (Orcinus orca) calls using cross-correlation methods
Comments: 26 pages, 6 figures
Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[322]  arXiv:2110.15684 (cross-list from eess.AS) [pdf, other]
Title: Fusing ASR Outputs in Joint Training for Speech Emotion Recognition
Comments: Accepted for ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[ total of 324 entries: 1-322 | 323-324 ]
[ showing 322 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help  (Access key information)