We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for recent submissions

[ total of 103 entries: 1-66 | 67-103 ]
[ showing 66 entries per page: fewer | more | all ]

Fri, 2 Jun 2023

[1]  arXiv:2306.00860 [pdf, other]
Title: Differentiable Allpass Filters for Phase Response Estimation and Automatic Signal Alignment
Comments: Collaboration done while interning/employed at Native Instruments. Accepted for publication in Proc. DAFX'23, Copenhagen, Denmark, September 2023. Sound examples at this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2306.00830 [pdf, ps, other]
Title: Adapting a ConvNeXt model to audio classification on AudioSet
Comments: Accepted at INTERSPEECH 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3]  arXiv:2306.00814 [pdf, other]
Title: Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Authors: Hubert Siuzdak
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[4]  arXiv:2306.00804 [pdf, other]
Title: Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[5]  arXiv:2306.00794 [pdf, other]
Title: SlothSpeech: Denial-of-service Attack Against Speech Recognition Models
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6]  arXiv:2306.00721 [pdf, other]
Title: UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
Comments: Accepted to Interspeech 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[7]  arXiv:2306.00689 [pdf, other]
Title: Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings
Comments: Accepted in International Journal of Speech Technology, Springer 2023 substantial overlap with arXiv:2204.01564
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8]  arXiv:2306.00680 [pdf, other]
Title: Encoder-decoder multimodal speaker change detection
Comments: 5 pages, accepted for presentation at INTERSPEECH 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[9]  arXiv:2306.00648 [pdf, other]
Title: EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis
Comments: Accepted by 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2306.00614 [pdf]
Title: Adaptation and Optimization of Automatic Speech Recognition (ASR) for the Maritime Domain in the Field of VHF Communication
Journal-ref: Proceedings of the COMPIT Conference 22 (2023) 345-354
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11]  arXiv:2306.00561 [pdf, other]
Title: Masked Autoencoders with Multi-Window Attention Are Better Audio Learners
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12]  arXiv:2306.00489 [pdf, other]
Title: Speech inpainting: Context-based speech synthesis guided by video
Comments: Accepted in Interspeech23
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[13]  arXiv:2306.00110 [pdf, other]
Title: MuseCoco: Generating Symbolic Music from Text
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[14]  arXiv:2306.00107 [pdf, other]
Title: MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15]  arXiv:2306.00952 (cross-list from eess.AS) [pdf, other]
Title: Speaker-specific Thresholding for Robust Imposter Identification in Unseen Speaker Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[16]  arXiv:2306.00812 (cross-list from eess.AS) [pdf, other]
Title: Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
Comments: accepted by Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2306.00755 (cross-list from cs.CL) [pdf, other]
Title: Enhancing the Unified Streaming and Non-streaming Model with Contrastive Learning
Comments: Accepted by INTERSPEECH 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18]  arXiv:2306.00736 (cross-list from eess.AS) [pdf, other]
Title: Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech
Comments: Accepted by Interspeech 2023, 5 pages, 1 figure, 4 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[19]  arXiv:2306.00634 (cross-list from eess.AS) [pdf, other]
Title: A Teacher-Student approach for extracting informative speaker embeddings from speech mixtures
Comments: Accepted for Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[20]  arXiv:2306.00482 (cross-list from cs.CY) [pdf, other]
Title: Inspecting Spoken Language Understanding from Kids for Basic Math Learning at Home
Comments: Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA) at ACL 2023
Subjects: Computers and Society (cs.CY); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); History and Overview (math.HO)
[21]  arXiv:2306.00426 (cross-list from eess.AS) [pdf]
Title: Speaker verification using attentive multi-scale convolutional recurrent network
Comments: 21 pages, 6 figures, 8 tables. Accepted for publication in Applied Soft Computing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22]  arXiv:2306.00410 (cross-list from cs.CL) [pdf, other]
Title: Towards hate speech detection in low-resource languages: Comparing ASR to acoustic word embeddings on Wolof and Swahili
Comments: Accepted to Interspeech 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2306.00331 (cross-list from eess.AS) [pdf, other]
Title: A Multi-dimensional Deep Structured State Space Approach to Speech Enhancement Using Small-footprint Models
Comments: Accepted to Interspeech 2023. Code will be released at this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP); Systems and Control (eess.SY)
[24]  arXiv:2306.00281 (cross-list from cs.LG) [pdf, other]
Title: Transfer Learning for Underrepresented Music Generation
Comments: 5 pages, 3 figures, International Conference on Computational Creativity
Journal-ref: Proceedings of the 2023 International Conference on Computational Creativity
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[25]  arXiv:2306.00208 (cross-list from cs.CL) [pdf, other]
Title: Strategies for improving low resource speech to text translation relying on pre-trained ASR models
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26]  arXiv:2306.00160 (cross-list from eess.AS) [pdf, other]
Title: Audio-Visual Speech Separation in Noisy Environments with a Lightweight Iterative Model
Comments: Accepted by Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[27]  arXiv:2306.00044 (cross-list from cs.LG) [pdf, ps, other]
Title: How to Construct Perfect and Worse-than-Coin-Flip Spoofing Countermeasures: A Word of Warning on Shortcut Learning
Comments: Interspeech 2023
Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Thu, 1 Jun 2023

[28]  arXiv:2305.20054 [pdf, other]
Title: UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures
Comments: in submission
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29]  arXiv:2305.19953 [pdf, ps, other]
Title: Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing
Comments: Interspeech 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30]  arXiv:2305.19612 [pdf, other]
Title: Underwater-Art: Expanding Information Perspectives With Text Templates For Underwater Acoustic Target Recognition
Journal-ref: The Journal of the Acoustical Society of America, 2022, 152(5): 2641-2651
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[31]  arXiv:2305.19603 [pdf, other]
Title: Intelligible Lip-to-Speech Synthesis with Speech Units
Comments: Interspeech 2023
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[32]  arXiv:2305.19602 [pdf, other]
Title: Learning Music Sequence Representation from Text Supervision
Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022: 4583-4587
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[33]  arXiv:2305.19581 [pdf, other]
Title: SVVAD: Personal Voice Activity Detection for Speaker Verification
Comments: Accepted by INTERSPEECH 2023
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[34]  arXiv:2305.19567 [pdf, other]
Title: DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Comments: Accepted at Interspeech 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[35]  arXiv:2305.19563 [pdf, other]
Title: Zero-Shot Automatic Pronunciation Assessment
Comments: Accepted to Interspeech 2023
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36]  arXiv:2305.19522 [pdf, other]
Title: PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37]  arXiv:2305.19458 [pdf, other]
Title: A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[38]  arXiv:2305.19304 [pdf]
Title: Audio classification using ML methods
Authors: Krishna Kumar
Comments: 3 pages, 8 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[39]  arXiv:2305.19769 (cross-list from cs.CL) [pdf, other]
Title: Attention-Based Methods For Audio Question Answering
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40]  arXiv:2305.19750 (cross-list from cs.CL) [pdf, other]
Title: Text-to-Speech Pipeline for Swiss German -- A comparison
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41]  arXiv:2305.19709 (cross-list from cs.CL) [pdf, other]
Title: XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech
Comments: In Proceedings of INTERSPEECH 2023 (to appear)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42]  arXiv:2305.19556 (cross-list from cs.CV) [pdf, other]
Title: Exploring Phonetic Context in Lip Movement for Authentic Talking Face Generation
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)

Wed, 31 May 2023

[43]  arXiv:2305.19130 [pdf, ps, other]
Title: Adaptation of Tongue Ultrasound-Based Silent Speech Interfaces Using Spatial Transformer Networks
Comments: 5 pages, 3 figures, 3 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44]  arXiv:2305.19020 [pdf, other]
Title: Pseudo-Siamese Network based Timbre-reserved Black-box Adversarial Attack in Speaker Identification
Comments: 5 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45]  arXiv:2305.18823 [pdf, other]
Title: Language-independent speaker anonymization using orthogonal Householder neural network
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[46]  arXiv:2305.18794 [pdf, other]
Title: Understanding temporally weakly supervised training: A case study for keyword spotting
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47]  arXiv:2305.18665 [pdf, other]
Title: E-PANNs: Sound Recognition Using Efficient Pre-trained Audio Neural Networks
Comments: Accepted in Internoise 2023 conference
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[48]  arXiv:2305.18596 [pdf, other]
Title: Building Accurate Low Latency ASR for Streaming Voice Search
Comments: Accepted at ACL 2023 Industry Track
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49]  arXiv:2305.18474 [pdf, other]
Title: Make-An-Audio 2: Temporal-Enhanced Text-to-Audio Generation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[50]  arXiv:2305.18392 [pdf, other]
Title: Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification
Comments: Accepted to Interspeech 2023
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[51]  arXiv:2305.18355 [pdf, other]
Title: An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[52]  arXiv:2305.19269 (cross-list from eess.AS) [pdf, other]
Title: Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[53]  arXiv:2305.19255 (cross-list from eess.AS) [pdf, other]
Title: A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem
Comments: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.15982
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[54]  arXiv:2305.19228 (cross-list from cs.CL) [pdf, other]
Title: Unsupervised Melody-to-Lyric Generation
Comments: Accepted to ACL 23. arXiv admin note: substantial text overlap with arXiv:2305.07760
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[55]  arXiv:2305.19184 (cross-list from eess.AS) [pdf, other]
Title: Leveraging Semantic Information for Efficient Self-Supervised Emotion Recognition with Audio-Textual Distilled Models
Comments: Accepted at Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[56]  arXiv:2305.19100 (cross-list from eess.AS) [pdf, other]
Title: Predicting Preferred Dialogue-to-Background Loudness Difference in Dialogue-Separated Audio
Comments: Paper accepted at the 15th International Conference on Quality of Multimedia Experience (QoMEX), 4 pages, 2 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57]  arXiv:2305.19090 (cross-list from eess.AS) [pdf]
Title: Prospective Validation of Motor-Based Intervention with Automated Mispronunciation Detection of Rhotics in Residual Speech Sound Disorders
Comments: To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58]  arXiv:2305.19051 (cross-list from eess.AS) [pdf, other]
Title: Towards single integrated spoofing-aware speaker verification embeddings
Comments: Accepted by INTERSPEECH 2023. Code and models are available in this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[59]  arXiv:2305.18975 (cross-list from eess.AS) [pdf, other]
Title: Voice Conversion With Just Nearest Neighbors
Comments: 5 page, 1 table, 2 figures. Accepted at Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[60]  arXiv:2305.18925 (cross-list from eess.AS) [pdf, other]
Title: Investigating model performance in language identification: beyond simple error statistics
Comments: Accepted to Interspeech 2023, 5 pages, 5 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[61]  arXiv:2305.18802 (cross-list from eess.AS) [pdf, other]
Title: LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Comments: Accepted to Interspeech 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[62]  arXiv:2305.18753 (cross-list from eess.AS) [pdf, other]
Title: Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
Comments: INTERSPEECH 2023. arXiv admin note: substantial text overlap with arXiv:2210.05037
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[63]  arXiv:2305.18551 (cross-list from astro-ph.IM) [pdf]
Title: Multi-Band Acoustic Monitoring of Aerial Signatures
Journal-ref: Journal of Astronomical Instrumentation, 12(1), 2340005 (2023)
Subjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[64]  arXiv:2305.18441 (cross-list from eess.AS) [pdf, other]
Title: DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Comments: INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[65]  arXiv:2305.18419 (cross-list from cs.CL) [pdf, other]
Title: Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR
Comments: Interspeech 2023. First 3 authors contributed equally
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Tue, 30 May 2023 (showing first 1 of 25 entries)

[66]  arXiv:2305.18108 [pdf, other]
Title: Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning
Comments: Accepted at INTERSPEECH 2023
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 103 entries: 1-66 | 67-103 ]
[ showing 66 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, new, 2306, contact, help  (Access key information)