We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Sep 2022

[ total of 183 entries: 1-183 ]
[ showing 183 entries per page: fewer | more ]
[1]  arXiv:2209.00423 [pdf, other]
Title: Spoofing-Aware Attention based ASV Back-end with Multiple Enrollment Utterances and a Sampling Strategy for the SASV Challenge 2022
Comments: Accepted by InterSpeech2022
Subjects: Audio and Speech Processing (eess.AS)
[2]  arXiv:2209.00485 [pdf, other]
Title: Joint Speaker Encoder and Neural Back-end Model for Fully End-to-End Automatic Speaker Verification with Multiple Enrollment Utterances
Comments: Submitted to TASLP
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[3]  arXiv:2209.00506 [pdf, other]
Title: On the potential of jointly-optimised solutions to spoofing attack detection and automatic speaker verification
Comments: Accepted to IberSPEECH 2022 Conference
Subjects: Audio and Speech Processing (eess.AS)
[4]  arXiv:2209.00619 [pdf, other]
Title: diaLogic: Non-Invasive Speaker-Focused Data Acquisition for Team Behavior Modeling
Subjects: Audio and Speech Processing (eess.AS)
[5]  arXiv:2209.00733 [pdf]
Title: A Wavelet Transform Based Scheme to Extract Speech Pitch and Formant Frequencies
Journal-ref: 2019 7th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[6]  arXiv:2209.00805 [pdf, other]
Title: Multi-scale temporal-frequency attention for music source separation
Subjects: Audio and Speech Processing (eess.AS)
[7]  arXiv:2209.00934 [pdf, other]
Title: TB or not TB? Acoustic cough analysis for tuberculosis classification
Comments: Accepted for publication at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[8]  arXiv:2209.00937 [pdf, other]
Title: Inverse-free Online Independent Vector Analysis with Flexible Iterative Source Steering
Comments: 5 pages, 2 figures. Submitted to APSIPA 2022
Subjects: Audio and Speech Processing (eess.AS)
[9]  arXiv:2209.01702 [pdf, other]
Title: Time-domain speech super-resolution with GAN based modeling for telephony speaker verification
Comments: Submit to IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Audio and Speech Processing (eess.AS)
[10]  arXiv:2209.01762 [pdf, other]
Title: Movement Detection of Tongue and Related Body Parts Using IR-UWB Radar
Comments: Submitted to the 13th International Conference on ICT Convergence (ICTC)
Subjects: Audio and Speech Processing (eess.AS)
[11]  arXiv:2209.01802 [pdf, other]
Title: Sound Event Localization and Detection for Real Spatial Sound Scenes: Event-Independent Network and Data Augmentation Chains
Comments: Submitted to DCASE 2022 Workshop. Code is available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12]  arXiv:2209.01978 [pdf, other]
Title: Investigation into Target Speaking Rate Adaptation for Voice Conversion
Comments: Accepted to INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS)
[13]  arXiv:2209.04175 [pdf, other]
Title: Streaming Target-Speaker ASR with Neural Transducer
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[14]  arXiv:2209.04473 [pdf, other]
Title: Reconstructing the Dynamic Directivity of Unconstrained Speech
Comments: 19 pages, 8 figures, 3 tables. Internally-reviewed manuscript approved for public release by Facebook Inc. Researched uses proprietary dataset not available for public release
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[15]  arXiv:2209.04974 [pdf, other]
Title: VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Comments: 6 pages, 2 figure, 3 tables, v2: Appendix A has been added
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[16]  arXiv:2209.05110 [pdf, other]
Title: Continuous head-related transfer function representation based on hyperspherical harmonics
Authors: Adam Szwajcowski
Comments: Submitted to Archives of Acoustics 4.06.2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2209.05161 [pdf, other]
Title: How Much Does Prosody Help Turn-taking? Investigations using Voice Activity Projection Models
Comments: SIGDIAL 2022 Best Paper Award Winner
Subjects: Audio and Speech Processing (eess.AS)
[18]  arXiv:2209.05273 [pdf, other]
Title: The 2022 Far-field Speaker Verification Challenge: Exploring domain mismatch and semi-supervised learning under the far-field scenario
Subjects: Audio and Speech Processing (eess.AS)
[19]  arXiv:2209.05281 [pdf, other]
Title: Modeling Dependent Structure for Utterances in ASR Evaluation
Authors: Zhe Liu, Fuchun Peng
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[20]  arXiv:2209.05735 [pdf, other]
Title: Learning ASR pathways: A sparse multilingual ASR model
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[21]  arXiv:2209.06058 [pdf, other]
Title: Streaming End-to-End Multilingual Speech Recognition with Joint Language Identification
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[22]  arXiv:2209.06265 [pdf, other]
Title: Automated detection of pronunciation errors in non-native English speech employing deep learning
Authors: Daniel Korzekwa
Comments: PhD Thesis, in English + extended summary in Polish
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Other Quantitative Biology (q-bio.OT)
[23]  arXiv:2209.06337 [pdf, other]
Title: Deep Speech Synthesis from Articulatory Representations
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[24]  arXiv:2209.06410 [pdf, other]
Title: A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25]  arXiv:2209.06581 [pdf, ps, other]
Title: Applying wav2vec2 for Speech Recognition on Bengali Common Voices Dataset
Comments: 5 pages
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[26]  arXiv:2209.06789 [pdf, other]
Title: Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis
Comments: Submitted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS)
[27]  arXiv:2209.06913 [pdf, other]
Title: ESSumm: Extractive Speech Summarization from Untranscribed Meeting
Authors: Jun Wang
Comments: Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[28]  arXiv:2209.07180 [pdf, ps, other]
Title: Open Challenges in Synthetic Speech Detection
Journal-ref: in IEEE International Workshop on Information Forensics and Security (WIFS), December 12-16, 2022, Shanghai, China, pp.1-6
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[29]  arXiv:2209.07196 [pdf, other]
Title: Environment Classification via Blind Roomprints Estimation
Journal-ref: in IEEE International Workshop on Information Forensics and Security (WIFS), December 12-16, 2022, Shanghai, China, pp.1-6
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[30]  arXiv:2209.07548 [pdf, other]
Title: Open Set Recognition For Music Genre Classification
Comments: 9 pages, 5 figures, 4 tables
Subjects: Audio and Speech Processing (eess.AS); Optimization and Control (math.OC)
[31]  arXiv:2209.08119 [pdf, other]
Title: An Automatic Speech Recognition System for Bengali Language based on Wav2Vec2 and Transfer Learning
Comments: BUET DL Sprint, 4 pages
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[32]  arXiv:2209.08326 [pdf, other]
Title: Parameter-Efficient Conformers via Sharing Sparsely-Gated Experts for End-to-End Speech Recognition
Comments: accepted in INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[33]  arXiv:2209.08379 [pdf, other]
Title: Representation Learning Strategies to Model Pathological Speech: Effect of Multiple Spectral Resolutions
Comments: 7 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[34]  arXiv:2209.09512 [pdf]
Title: A Combined Model for Noise Reduction of Lung Sound Signals Based on Empirical Mode Decomposition and Artificial Neural Network
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[35]  arXiv:2209.09756 [pdf, other]
Title: ESPnet-ONNX: Bridging a Gap Between Research and Production
Comments: Accepted to APSIPA ASC 2022
Subjects: Audio and Speech Processing (eess.AS)
[36]  arXiv:2209.09967 [src]
Title: Language-based Audio Retrieval Task in DCASE 2022 Challenge
Comments: Update for arXiv:2206.06108 mistakenly submitted as a new article
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37]  arXiv:2209.10088 [pdf, other]
Title: Boosting Star-GANs for Voice Conversion with Contrastive Discriminator
Comments: 12 pages, 3 figures, Accepted by ICONIP 2022
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[38]  arXiv:2209.10147 [pdf, ps, other]
Title: The ReturnZero System for VoxCeleb Speaker Recognition Challenge 2022
Comments: 4 pages, 4 tables, technical report
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[39]  arXiv:2209.10357 [pdf, other]
Title: GIST-AiTeR System for the Diarization Task of the 2022 VoxCeleb Speaker Recognition Challenge
Comments: 2022 VoxSRC Track4
Subjects: Audio and Speech Processing (eess.AS)
[40]  arXiv:2209.10446 [pdf, ps, other]
Title: Mandarin Singing Voice Synthesis with Denoising Diffusion Probabilistic Wasserstein GAN
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[41]  arXiv:2209.10479 [pdf, other]
Title: An Initial study on Birdsong Re-synthesis Using Neural Vocoders
Comments: To appear in 24th International Conference on Speech and Computer (SPECOM), GURUGRAM, INDIA
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[42]  arXiv:2209.10591 [pdf, other]
Title: Assessing ASR Model Quality on Disordered Speech using BERTScore
Comments: Accepted to Interspeech 2022 Workshop on Speech for Social Good
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
[43]  arXiv:2209.10890 [pdf, other]
Title: EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
Journal-ref: Interspeech 2022, 823-827 (2022)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[44]  arXiv:2209.11061 [pdf, other]
Title: Cross-domain Voice Activity Detection with Self-Supervised Representations
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[45]  arXiv:2209.11296 [pdf, other]
Title: Isolation performance metrics for personal sound zone reproduction systems
Subjects: Audio and Speech Processing (eess.AS)
[46]  arXiv:2209.11433 [pdf, other]
Title: The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022
Comments: System description of VoxSRC 2022: track 1, 2 and 4
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47]  arXiv:2209.11494 [pdf, other]
Title: MMS-MSG: A Multi-purpose Multi-Speaker Mixture Signal Generator
Comments: Accepted at IWAENC 2022
Subjects: Audio and Speech Processing (eess.AS)
[48]  arXiv:2209.11666 [pdf, other]
Title: Stereo InSE-NET: Stereo Audio Quality Predictor Transfer Learned from Mono InSE-NET
Comments: Accepted to 153rd Audio Engineering Society (AES), New York, NY, USA, October 2022
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[49]  arXiv:2209.11866 [pdf, other]
Title: ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed
Comments: Audio samples: this https URL; Code: this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50]  arXiv:2209.11969 [pdf, other]
Title: NWPU-ASLP System for the VoicePrivacy 2022 Challenge
Comments: VoicePrivacy 2022 Challenge
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[51]  arXiv:2209.12002 [pdf, other]
Title: Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting
Comments: Accepted by Interspeech 2022. arXiv admin note: text overlap with arXiv:2202.05744
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[52]  arXiv:2209.12702 [pdf, other]
Title: End-to-End Lyrics Recognition with Self-supervised Learning
Comments: 4 pages, 2 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[53]  arXiv:2209.12826 [pdf, other]
Title: Multi-encoder attention-based architectures for sound recognition with partial visual assistance
Comments: Submitted to EURASIP Journal on Audio, Speech, and Music Processing
Journal-ref: EURASIP Journal on Audio, Speech, and Music Processing; 2022; 25
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[54]  arXiv:2209.12843 [pdf, other]
Title: Impact of temporal resolution on convolutional recurrent networks for audio tagging and sound event detection
Comments: Submitted to DCASE 2022 Workshop
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[55]  arXiv:2209.13112 [pdf, other]
Title: Automated Sex Classification of Children's Voices and Changes in Differentiating Factors with Age
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56]  arXiv:2209.13146 [pdf, other]
Title: Predicting Affective Vocal Bursts with Finetuned wav2vec 2.0
Subjects: Audio and Speech Processing (eess.AS)
[57]  arXiv:2209.13211 [pdf, other]
Title: Hyperbolic Timbre Embedding for Musical Instrument Sound Synthesis Based on Variational Autoencoders
Comments: 8 pages, 4 figures, to be published in Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2022
Journal-ref: 2022 Asia Pacific Signal and Information Processing Association Annual Summit and Conference
Subjects: Audio and Speech Processing (eess.AS)
[58]  arXiv:2209.14150 [pdf, other]
Title: Speech Enhancement Using Self-Supervised Pre-Trained Model and Vector Quantization
Comments: Accepted to APSIPA ASC 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[59]  arXiv:2209.14275 [pdf, other]
Title: Audio Retrieval with WavText5K and CLAP Training
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI)
[60]  arXiv:2209.14335 [pdf]
Title: Text Independent Speaker Identification System for Access Control
Comments: 4 pages
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[61]  arXiv:2209.15032 [pdf, other]
Title: Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0
Comments: This preprint is a pre-review version of the paper and does not contain any post-submission improvements or corrections. The Version of Record of this contribution is published in the proceedings of the International Conference on Text, Speech, and Dialogue (TSD 2022), LNAI volume 13502, and is available online at this https URL
Journal-ref: International Conference on Text, Speech, and Dialogue (TSD 2022), LNAI volume 13502
Subjects: Audio and Speech Processing (eess.AS)
[62]  arXiv:2209.15174 [pdf, other]
Title: Music Source Separation with Band-split RNN
Authors: Yi Luo, Jianwei Yu
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[63]  arXiv:2209.15449 [pdf, other]
Title: End-to-End Label Uncertainty Modeling in Speech Emotion Recognition using Bayesian Neural Networks and Label Distribution Learning
Comments: arXiv admin note: text overlap with arXiv:2207.12135
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[64]  arXiv:2209.15472 [pdf, other]
Title: Binaural Speech Enhancement Using STOI-Optimal Masks
Comments: Accepted at IWAENC 2022
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[65]  arXiv:2209.02768 (cross-list from eess.IV) [pdf]
Title: Rapid dynamic speech imaging at 3 Tesla using combination of a custom vocal tract coil, variable density spirals and manifold regularization
Comments: 30 pages, 10 figures
Subjects: Image and Video Processing (eess.IV); Audio and Speech Processing (eess.AS); Medical Physics (physics.med-ph)
[66]  arXiv:2209.11896 (cross-list from eess.IV) [pdf, other]
Title: Unsupervised active speaker detection in media content using cross-modal information
Comments: Under review at IEEE Transactions on Image Processing
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[67]  arXiv:2209.00130 (cross-list from cs.SD) [pdf, other]
Title: Evaluating generative audio systems and their metrics
Comments: Accepted at ISMIR 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[68]  arXiv:2209.00182 (cross-list from cs.SD) [pdf, other]
Title: What is missing in deep music generation? A study of repetition and structure in popular music
Comments: In Proceedings of the 23rd Int. Society for Music Information Retrieval (ISMIR) 2022
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[69]  arXiv:2209.00260 (cross-list from cs.CL) [pdf, other]
Title: Deep Sparse Conformer for Speech Recognition
Authors: Xianchao Wu
Comments: 5 pages, 1 figure
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70]  arXiv:2209.00261 (cross-list from cs.CL) [pdf, other]
Title: Attention Enhanced Citrinet for Speech Recognition
Authors: Xianchao Wu
Comments: 5 pages, 3 figures
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[71]  arXiv:2209.00277 (cross-list from cs.CV) [pdf, other]
Title: Video-Guided Curriculum Learning for Spoken Video Grounding
Comments: Accepted by ACM MM 2022
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[72]  arXiv:2209.00291 (cross-list from cs.SD) [pdf, other]
Title: Generating Coherent Drum Accompaniment With Fills And Improvisations
Comments: 8 pages, 7 figures, 23rd International Society for Music Information Retrieval Conference (ISMIR 2022), Bengaluru, India
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[73]  arXiv:2209.00353 (cross-list from cs.SD) [pdf, other]
Title: AccoMontage2: A Complete Harmonization and Accompaniment Arrangement System
Comments: Accepted by ISMIR 2022
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[74]  arXiv:2209.00642 (cross-list from cs.CV) [pdf, other]
Title: Lip-to-Speech Synthesis for Arbitrary Speakers in the Wild
Comments: Accepted in ACM-MM 2022, 9 pages, 2 pages supplementary, 7 Figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75]  arXiv:2209.01250 (cross-list from cs.CL) [pdf, other]
Title: Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[76]  arXiv:2209.01374 (cross-list from cs.SD) [pdf]
Title: Identify The Beehive Sound Using Deep Learning
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[77]  arXiv:2209.01478 (cross-list from cs.SD) [pdf, other]
Title: Equivariant Self-Supervision for Musical Tempo Estimation
Authors: Elio Quinton
Comments: Accepted at ISMIR 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[78]  arXiv:2209.01751 (cross-list from cs.SD) [pdf, other]
Title: Exploiting Pre-trained Feature Networks for Generative Adversarial Networks in Audio-domain Loop Generation
Comments: Accepted at ISMIR 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79]  arXiv:2209.01768 (cross-list from cs.MM) [pdf, other]
Title: Predict-and-Update Network: Audio-Visual Speech Recognition Inspired by Human Speech Perception
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80]  arXiv:2209.01996 (cross-list from cs.SD) [pdf, other]
Title: Bridging Music and Text with Crowdsourced Music Comments: A Sequence-to-Sequence Framework for Thematic Music Comments Generation
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[81]  arXiv:2209.02030 (cross-list from cs.CL) [pdf, other]
Title: Distilling the Knowledge of BERT for CTC-based ASR
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82]  arXiv:2209.02604 (cross-list from cs.MM) [pdf, other]
Title: Make Acoustic and Visual Cues Matter: CH-SIMS v2.0 Dataset and AV-Mixup Consistent Module
Comments: 16pages, 7 figures, accepted by ICMI 2022
Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83]  arXiv:2209.02696 (cross-list from cs.SD) [pdf, other]
Title: Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model
Comments: Submitted to NeurIPS 2022 Workshop on Machine Learning for Creativity and Design
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[84]  arXiv:2209.02785 (cross-list from cs.SD) [pdf, other]
Title: Read it to me: An emotionally aware Speech Narration Application
Authors: Rishibha Bansal
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[85]  arXiv:2209.02855 (cross-list from cs.SD) [pdf, other]
Title: The Role of Vocal Persona in Natural and Synthesized Speech
Comments: To be published in the proceedings of the 17th IEEE International Conference on Automatic Face and Gesture Recognition as part of the Workshop on Socially Interactive Human-like Virtual Agents (SIVA '23)
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS); Systems and Control (eess.SY)
[86]  arXiv:2209.02871 (cross-list from cs.SD) [pdf, other]
Title: Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments
Comments: Camera Ready for Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022
Journal-ref: The 23rd International Society for Music Information Retrieval Conference, 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[87]  arXiv:2209.03143 (cross-list from cs.SD) [pdf, other]
Title: AudioLM: a Language Modeling Approach to Audio Generation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[88]  arXiv:2209.03275 (cross-list from cs.SD) [pdf, other]
Title: Multimodal Speech Enhancement Using Burst Propagation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[89]  arXiv:2209.03338 (cross-list from cs.MM) [pdf, other]
Title: ESSYS* Sharing #UC: An Emotion-driven Audiovisual Installation
Comments: Paper to be published in 2022 IEEE VIS Arts Program (VISAP 2022). For the associated supplementary materials, see this https URL
Journal-ref: 2022 IEEE VIS Arts Program (VISAP 2022)
Subjects: Multimedia (cs.MM); Human-Computer Interaction (cs.HC); Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90]  arXiv:2209.03711 (cross-list from cs.SD) [pdf, other]
Title: What Did I Just Hear? Detecting Pornographic Sounds in Adult Videos Using Neural Networks
Comments: Published in AudioMostly 2022, ACM
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[91]  arXiv:2209.03727 (cross-list from cs.SD) [pdf, other]
Title: Developing a multi-variate prediction model for the detection of COVID-19 from Crowd-sourced Respiratory Voice Data
Comments: 9 pages, 6 figures, poster presented at the European Respiratory Society, Barcelona, September 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[92]  arXiv:2209.03787 (cross-list from cs.CL) [pdf, other]
Title: Goodness of Pronunciation Pipelines for OOV Problem
Authors: Ankit Grover
Comments: 47 pages, 24 Figures, 1 Table
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93]  arXiv:2209.03807 (cross-list from cs.SD) [pdf, other]
Title: Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices
Comments: Accepted Version for: EUROMICRO DSD 2022
Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[94]  arXiv:2209.03901 (cross-list from cs.SD) [pdf, other]
Title: Dyadic Interaction Assessment from Free-living Audio for Depression Severity Assessment
Comments: Accepted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[95]  arXiv:2209.03952 (cross-list from cs.SD) [pdf, other]
Title: TF-GridNet: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation
Comments: in IEEE ICASSP 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96]  arXiv:2209.04062 (cross-list from cs.CL) [pdf, other]
Title: Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM
Comments: Accepted in Interspeech2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97]  arXiv:2209.04071 (cross-list from cs.AI) [pdf]
Title: Audio Analytics-based Human Trafficking Detection Framework for Autonomous Vehicles
Subjects: Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98]  arXiv:2209.04075 (cross-list from cs.SD) [pdf]
Title: Improving the Environmental Perception of Autonomous Vehicles using Deep Learning-based Audio Classification
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[99]  arXiv:2209.04077 (cross-list from cs.SD) [pdf, other]
Title: Prediction method of Soundscape Impressions using Environmental Sounds and Aerial Photographs
Comments: Submitted APSIPA ASC 2022
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[100]  arXiv:2209.04093 (cross-list from cs.CV) [pdf, other]
Title: Learning Audio-Visual embedding for Person Verification in the Wild
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[101]  arXiv:2209.04109 (cross-list from cs.SD) [pdf, other]
Title: MATT: A Multiple-instance Attention Mechanism for Long-tail Music Genre Classification
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[102]  arXiv:2209.04167 (cross-list from cs.SD) [pdf, other]
Title: Overlapped speech and gender detection with WavLM pre-trained features
Comments: Submitted and accepted to Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[103]  arXiv:2209.04360 (cross-list from cs.SD) [pdf, other]
Title: A Semi-Supervised Algorithm for Improving the Consistency of Crowdsourced Datasets: The COVID-19 Case Study on Respiratory Disorder Classification
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[104]  arXiv:2209.04406 (cross-list from q-bio.NC) [pdf, other]
Title: Longitudinal Acoustic Speech Tracking Following Pediatric Traumatic Brain Injury
Subjects: Neurons and Cognition (q-bio.NC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105]  arXiv:2209.04530 (cross-list from cs.SD) [pdf, other]
Title: DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[106]  arXiv:2209.04547 (cross-list from cs.CR) [pdf, other]
Title: Defend Data Poisoning Attacks on Voice Authentication
Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107]  arXiv:2209.04687 (cross-list from cs.SD) [pdf, other]
Title: Pay Attention to Hard Trials
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108]  arXiv:2209.05900 (cross-list from cs.SD) [pdf, other]
Title: Binaural Signal Representations for Joint Sound Event Detection and Acoustic Scene Classification
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[109]  arXiv:2209.05978 (cross-list from cs.LG) [pdf, other]
Title: A Distributed Acoustic Sensor System for Intelligent Transportation using Deep Learning
Comments: 9 pages, 4 figures
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[110]  arXiv:2209.06054 (cross-list from cs.SD) [pdf, other]
Title: SongDriver: Real-time Music Accompaniment Generation without Logical Latency nor Exposure Bias
Comments: *Both Zihao Wang and Qihao Liang contribute equally to the paper and share the co-first authorship. This paper has been accepted by ACM Multimedia 2022, oral session, full paper (main track)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[111]  arXiv:2209.06085 (cross-list from cs.CE) [pdf, other]
Title: Acoustic-Linguistic Features for Modeling Neurological Task Score in Alzheimer's
Comments: The paper has been accepted to Pacific Symposium on Biocomputing \c{opyright} [2022] World Scientific Publishing Co., Singapore, this http URL and is currently being camera-readied
Subjects: Computational Engineering, Finance, and Science (cs.CE); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112]  arXiv:2209.06096 (cross-list from cs.CL) [pdf, other]
Title: Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Comments: Accepted for publication in Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113]  arXiv:2209.06358 (cross-list from cs.SD) [pdf, other]
Title: Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset
Comments: Preprint; accepted for Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[114]  arXiv:2209.06360 (cross-list from cs.SD) [pdf, other]
Title: I2CR: Improving Noise Robustness on Keyword Spotting Using Inter-Intra Contrastive Regularization
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115]  arXiv:2209.06434 (cross-list from cs.SD) [pdf, other]
Title: ConvNeXt Based Neural Network for Audio Anti-Spoofing
Comments: 6 pages
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[116]  arXiv:2209.06484 (cross-list from cs.SD) [pdf, other]
Title: ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS
Comments: Published in IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[117]  arXiv:2209.06496 (cross-list from cs.MM) [pdf, other]
Title: CCOM-HuQin: an Annotated Multimodal Chinese Fiddle Performance Dataset
Comments: 14 pages, 11 figures
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[118]  arXiv:2209.06633 (cross-list from cs.CL) [pdf, other]
Title: Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings
Comments: Accepted in INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[119]  arXiv:2209.06987 (cross-list from cs.SD) [pdf, other]
Title: Non-Parallel Voice Conversion for ASR Augmentation
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[120]  arXiv:2209.07140 (cross-list from cs.SD) [pdf, other]
Title: Beat Transformer: Demixed Beat and Downbeat Tracking with Dilated Self-Attention
Comments: Accepted by ISMIR 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121]  arXiv:2209.07144 (cross-list from cs.SD) [pdf, other]
Title: Domain Adversarial Training on Conditional Variational Auto-Encoder for Controllable Music Generation
Comments: Accepted by ISMIR 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122]  arXiv:2209.07302 (cross-list from cs.SD) [pdf, other]
Title: MVNet: Memory Assistance and Vocal Reinforcement Network for Speech Enhancement
Comments: ICONIP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123]  arXiv:2209.07384 (cross-list from cs.SD) [pdf, other]
Title: Self-Supervised Attention Networks and Uncertainty Loss Weighting for Multi-Task Emotion Recognition on Vocal Bursts
Comments: 4 pages, 1 figure, accepted at The 2022 ACII Affective Vocal Burst Workshop & Challenge (A-VB)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[124]  arXiv:2209.07424 (cross-list from cs.CL) [pdf, other]
Title: CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations
Comments: Accepted by IJCNN 2022
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2209.07629 (cross-list from cs.SD) [pdf, other]
Title: Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[126]  arXiv:2209.07974 (cross-list from cs.SD) [pdf, other]
Title: musicaiz: A Python Library for Symbolic Music Generation, Analysis and Visualization
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[127]  arXiv:2209.08212 (cross-list from cs.SD) [pdf, other]
Title: Compose & Embellish: Well-Structured Piano Performance Generation via A Two-Stage Approach
Comments: Accepted to International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[128]  arXiv:2209.08774 (cross-list from cs.SD) [pdf, other]
Title: Playing Technique Detection by Fusing Note Onset Information in Guzheng Performance
Comments: Accepted to ISMIR 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[129]  arXiv:2209.08795 (cross-list from cs.MM) [pdf, other]
Title: AutoLV: Automatic Lecture Video Generator
Comments: 4 pages, 4 figures, ICIP 2022
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130]  arXiv:2209.09010 (cross-list from cs.SD) [pdf, other]
Title: The Royalflush System for VoxCeleb Speaker Recognition Challenge 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[131]  arXiv:2209.09076 (cross-list from cs.SD) [pdf, other]
Title: SJTU-AISPEECH System for VoxCeleb Speaker Recognition Challenge 2022
Comments: System description of VoxSRC 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132]  arXiv:2209.09621 (cross-list from cs.IR) [pdf, other]
Title: Scaling and compressing melodies using geometric similarity measures
Subjects: Information Retrieval (cs.IR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133]  arXiv:2209.09635 (cross-list from cs.SD) [pdf]
Title: The BUCEA Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134]  arXiv:2209.09735 (cross-list from cs.LG) [pdf, ps, other]
Title: Relaxed Attention for Transformer Models
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[135]  arXiv:2209.09955 (cross-list from cs.SD) [pdf, other]
Title: Meta-Learning for Adaptive Filters with Higher-Order Frequency Dependencies
Comments: Source code and audio examples: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136]  arXiv:2209.10016 (cross-list from cs.SD) [pdf, other]
Title: Setting the rhythm scene: deep learning-based drum loop generation from arbitrary language cues
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[137]  arXiv:2209.10223 (cross-list from cs.SD) [pdf, other]
Title: Dynamic Time-Alignment of Dimensional Annotations of Emotion using Recurrent Neural Networks
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[138]  arXiv:2209.10259 (cross-list from cs.SD) [pdf, other]
Title: Learning Hierarchical Metrical Structure Beyond Measures
Comments: Accepted at the International Society for Music Information Retrieval (ISMIR), 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[139]  arXiv:2209.10674 (cross-list from cs.SD) [pdf, other]
Title: Modeling Perceptual Loudness of Piano Tone: Theory and Applications
Comments: Accepted to ISMIR 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[140]  arXiv:2209.10791 (cross-list from cs.CL) [pdf, other]
Title: Homophone Reveals the Truth: A Reality Check for Speech2Vec
Authors: Guangyu Chen
Comments: Corrected typos
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141]  arXiv:2209.10804 (cross-list from cs.SD) [pdf, other]
Title: Controllable Accented Text-to-Speech Synthesis
Comments: To be submitted for possible journal publication
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[142]  arXiv:2209.10846 (cross-list from cs.SD) [pdf, ps, other]
Title: The SpeakIn System Description for CNSRC2022
Comments: 4 pages
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[143]  arXiv:2209.10848 (cross-list from cs.SD) [pdf, other]
Title: MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Comments: Accepted at the 2022 International Conference on Asian Language Processing (IALP2022)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[144]  arXiv:2209.10887 (cross-list from cs.SD) [pdf, other]
Title: A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[145]  arXiv:2209.10970 (cross-list from cs.SD) [pdf, other]
Title: Maths, Computation and Flamenco: overview and challenges
Subjects: Sound (cs.SD); Computational Geometry (cs.CG); Audio and Speech Processing (eess.AS)
[146]  arXiv:2209.11003 (cross-list from cs.SD) [pdf, other]
Title: Predicting pairwise preferences between TTS audio stimuli using parallel ratings data and anti-symmetric twin neural networks
Journal-ref: Proceedings of INTERSPEECH 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[147]  arXiv:2209.11112 (cross-list from cs.SD) [pdf, other]
Title: CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement
Comments: 16 pages, 10 figures and 5 tables. arXiv admin note: text overlap with arXiv:2203.15149
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[148]  arXiv:2209.11377 (cross-list from cs.SD) [pdf, other]
Title: UniKW-AT: Unified Keyword Spotting and Audio Tagging
Comments: Accepted in Interspeech2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149]  arXiv:2209.11527 (cross-list from cs.SD) [pdf, other]
Title: An artificial neural network-based system for detecting machine failures using tiny sound data: A case study
Comments: 8 pages, 9 figures, conference
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[150]  arXiv:2209.11585 (cross-list from cs.SD) [pdf]
Title: Synthetic Voice Spoofing Detection Based On Online Hard Example Mining
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[151]  arXiv:2209.11625 (cross-list from cs.SD) [pdf, ps, other]
Title: The SpeakIn Speaker Verification System for Far-Field Speaker Verification Challenge 2022
Comments: 5 pages. arXiv admin note: text overlap with arXiv:2209.10846
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[152]  arXiv:2209.11905 (cross-list from cs.SD) [pdf, other]
Title: Speech Enhancement with Perceptually-motivated Optimization and Dual Transformations
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[153]  arXiv:2209.11906 (cross-list from cs.SD) [pdf, other]
Title: Joint Speech Activity and Overlap Detection with Multi-Exit Architecture
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[154]  arXiv:2209.12043 (cross-list from cs.SD) [pdf, other]
Title: Unsupervised domain adaptation for speech recognition with unsupervised error correction
Comments: Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[155]  arXiv:2209.12045 (cross-list from cs.SD) [pdf, other]
Title: Song Emotion Recognition: a Performance Comparison Between Audio Features and Artificial Neural Networks
Comments: 7 pages,
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[156]  arXiv:2209.12202 (cross-list from cs.SD) [pdf, other]
Title: Multimodal Exponentially Modified Gaussian Oscillators
Comments: IEEE International Ultrasonic Symposium 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS); Applied Physics (physics.app-ph)
[157]  arXiv:2209.12549 (cross-list from cs.SD) [pdf, ps, other]
Title: Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech
Comments: 6 pages, 1 figure, Accepted for APSIPA ASC 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[158]  arXiv:2209.12573 (cross-list from cs.SD) [pdf, other]
Title: Digital Audio Forensics: Blind Human Voice Mimicry Detection
Comments: 11 pages, 4 figures (6 if you count subfigures), 2 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[159]  arXiv:2209.12602 (cross-list from cs.SD) [pdf]
Title: Effects of language mismatch in automatic forensic voice comparison using deep learning embeddings
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[160]  arXiv:2209.12650 (cross-list from cs.CL) [pdf, other]
Title: Bangla-Wave: Improving Bangla Automatic Speech Recognition Utilizing N-gram Language Models
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[161]  arXiv:2209.12652 (cross-list from cs.CL) [pdf, other]
Title: AI-powered Language Assessment Tools for Dementia
Comments: 27 Pages, 11 Tables, 16 Figures
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[162]  arXiv:2209.12816 (cross-list from cs.CL) [pdf, other]
Title: Fast-FNet: Accelerating Transformer Encoder Models via Efficient Fourier Layers
Comments: Submitted to IEEE Transactions on Audio, Speech and Language Processing. 12 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); General Literature (cs.GL); Audio and Speech Processing (eess.AS)
[163]  arXiv:2209.12900 (cross-list from cs.SD) [pdf, other]
Title: The Efficacy of Self-Supervised Speech Models for Audio Representations
Comments: to appear in Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[164]  arXiv:2209.12942 (cross-list from cs.CL) [pdf]
Title: Cross-lingual Dysarthria Severity Classification for English, Korean, and Tamil
Comments: 9 pages, 4 figures, APSIPA 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165]  arXiv:2209.13385 (cross-list from q-bio.QM) [pdf, ps, other]
Title: Beyond Heart Murmur Detection: Automatic Murmur Grading from Phonocardiogram
Subjects: Quantitative Methods (q-bio.QM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166]  arXiv:2209.13598 (cross-list from cs.SD) [pdf, other]
Title: Computing Melodic Templates in Oral Music Traditions
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[167]  arXiv:2209.13914 (cross-list from cs.SD) [pdf, other]
Title: An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[168]  arXiv:2209.13921 (cross-list from cs.HC) [pdf]
Title: Entangling Practice with Artistic and Educational Aims: Interviews on Technology-based Movement Sound Interactions
Comments: New Interfaces for Musical Expression (NIME), Jun 2022, Auckland, New Zealand
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169]  arXiv:2209.14078 (cross-list from cs.SD) [pdf, other]
Title: MeWEHV: Mel and Wave Embeddings for Human Voice Tasks
Comments: Submitted to Expert Systems with Applications
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170]  arXiv:2209.14098 (cross-list from cs.SD) [pdf, other]
Title: Deepfake audio detection by speaker verification
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[171]  arXiv:2209.14272 (cross-list from cs.LG) [pdf, other]
Title: Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results
Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[172]  arXiv:2209.14458 (cross-list from cs.SD) [pdf, other]
Title: The Chamber Ensemble Generator: Limitless High-Quality MIR Data via Generative Modeling
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[173]  arXiv:2209.14842 (cross-list from cs.SD) [pdf, other]
Title: Classification of Vocal Bursts for ACII 2022 A-VB-Type Competition using Convolutional Neural Networks and Deep Acoustic Embeddings
Comments: Report for our submission to the ACII 2022 Affective Vocal Bursts (A-VB) Competition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174]  arXiv:2209.14868 (cross-list from cs.SD) [pdf, other]
Title: ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition
Comments: This paper was presented in Interspeech 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[175]  arXiv:2209.15167 (cross-list from cs.SD) [pdf, other]
Title: An empirical study of weakly supervised audio tagging embeddings for general audio representations
Comments: Odyssey 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176]  arXiv:2209.15200 (cross-list from cs.SD) [pdf, other]
Title: An efficient encoder-decoder architecture with top-down attention for speech separation
Comments: Accepted by ICLR 2023; Code & Demos: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[177]  arXiv:2209.15296 (cross-list from cs.SD) [pdf, other]
Title: Wake Word Detection Based on Res2Net
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178]  arXiv:2209.15325 (cross-list from cs.SD) [pdf]
Title: Symphony: Localizing Multiple Acoustic Sources with a Single Microphone Array
Subjects: Sound (cs.SD); Networking and Internet Architecture (cs.NI); Audio and Speech Processing (eess.AS)
[179]  arXiv:2209.15329 (cross-list from cs.CL) [pdf, other]
Title: SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data
Comments: 14 pages
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[180]  arXiv:2209.15334 (cross-list from cs.SD) [pdf]
Title: ChordMics: Acoustic Signal Purification with Distributed Microphones
Subjects: Sound (cs.SD); Networking and Internet Architecture (cs.NI); Audio and Speech Processing (eess.AS)
[181]  arXiv:2209.15352 (cross-list from cs.SD) [pdf, other]
Title: AudioGen: Textually Guided Audio Generation
Comments: Accepted to ICLR 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[182]  arXiv:2209.15483 (cross-list from cs.CL) [pdf, other]
Title: On The Robustness of Self-Supervised Representations for Spoken Language Modeling
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[183]  arXiv:2209.15575 (cross-list from cs.SD) [pdf, other]
Title: Match to Win: Analysing Sequences Lengths for Efficient Self-supervised Learning in Speech and Audio
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[ total of 183 entries: 1-183 ]
[ showing 183 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2304, contact, help  (Access key information)