We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Sep 2023, skipping first 75

[ total of 465 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 451-465 ]
[ showing 25 entries per page: fewer | more | all ]
[76]  arXiv:2309.08157 [pdf, other]
Title: RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function
Comments: Submitted to ICASSP2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[77]  arXiv:2309.08255 [pdf, other]
Title: Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech
Comments: Accepted at ICONIP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[78]  arXiv:2309.08263 [pdf, other]
Title: Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses
Comments: Accepted in The German Annual Conference on Acoustics 2023 (DAGA)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79]  arXiv:2309.08279 [pdf, other]
Title: Improving Short Utterance Anti-Spoofing with AASIST2
Comments: 5 pages, 2 figures, accepted by ICASSP
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[80]  arXiv:2309.08285 [pdf, other]
Title: One-Class Knowledge Distillation for Spoofing Speech Detection
Comments: submitted to icassp 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[81]  arXiv:2309.08290 [pdf, other]
Title: Head-Related Transfer Function Interpolation with a Spherical CNN
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[82]  arXiv:2309.08294 [pdf, other]
Title: Speech-dependent Modeling of Own Voice Transfer Characteristics for In-ear Microphones in Hearables
Comments: Presented at Forum Acusticum 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[83]  arXiv:2309.08295 [pdf, other]
Title: A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[84]  arXiv:2309.08320 [pdf, other]
Title: Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models
Comments: 5 pages, 2 figures, accepted for ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[85]  arXiv:2309.08348 [pdf, other]
Title: The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[86]  arXiv:2309.08355 [pdf, other]
Title: Semi-supervised Sound Event Detection with Local and Global Consistency Regularization
Comments: submitted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[87]  arXiv:2309.08357 [pdf, other]
Title: Audio-free Prompt Tuning for Language-Audio Models
Comments: submitted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[88]  arXiv:2309.08377 [pdf, other]
Title: DiaCorrect: Error Correction Back-end For Speaker Diarization
Comments: Submitted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[89]  arXiv:2309.08436 [pdf, other]
Title: Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition
Comments: Accepted at ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[90]  arXiv:2309.08454 [pdf, other]
Title: Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition
Comments: Submitted to ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[91]  arXiv:2309.08489 [pdf, other]
Title: Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[92]  arXiv:2309.08561 [pdf, other]
Title: Open-vocabulary Keyword-spotting with Adaptive Instance Normalization
Comments: Under Review
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[93]  arXiv:2309.08684 [pdf, other]
Title: Music Source Separation Based on a Lightweight Deep Learning Framework (DTTNET: DUAL-PATH TFC-TDF UNET)
Comments: Accepted for ICASSP 2024. Additional experiments can be found in the published version on IEEE Xplore
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94]  arXiv:2309.08730 [pdf, other]
Title: MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
Journal-ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[95]  arXiv:2309.08804 [pdf, other]
Title: Stack-and-Delay: a new codebook pattern for music generation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[96]  arXiv:2309.08828 [pdf, other]
Title: Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[97]  arXiv:2309.08876 [pdf, ps, other]
Title: Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[98]  arXiv:2309.09028 [pdf, other]
Title: Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Comments: Paper in submission
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[99]  arXiv:2309.09180 [pdf, other]
Title: Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture
Comments: Accepted by ICASSP 2024
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[100]  arXiv:2309.09220 [pdf, other]
Title: Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[ total of 465 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 451-465 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2406, contact, help  (Access key information)