We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Feb 2023, skipping first 50

[ total of 182 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 176-182 ]
[ showing 25 entries per page: fewer | more | all ]
[51]  arXiv:2302.12369 [pdf, other]
Title: Factual Consistency Oriented Speech Recognition
Comments: 5 pages, 1 figure, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[52]  arXiv:2302.12391 [pdf, other]
Title: PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS
Comments: 6 pages, preprint
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53]  arXiv:2302.12757 [pdf, other]
Title: Ensemble knowledge distillation of self-supervised speech models
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[54]  arXiv:2302.13063 [pdf, other]
Title: Time-Variance Aware Real-Time Speech Enhancement
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[55]  arXiv:2302.13209 [pdf, other]
Title: I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56]  arXiv:2302.13407 [pdf, other]
Title: DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement
Comments: 5 pages, 1 figure, 2 tables
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57]  arXiv:2302.13458 [pdf, other]
Title: Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Comments: Accepted for ICASSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58]  arXiv:2302.13527 [pdf, ps, other]
Title: Complex Clipping for Improved Generalization in Machine Learning
Comments: Submitted to IEEE Signal Processing Letters
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[59]  arXiv:2302.13652 [pdf, ps, other]
Title: Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech
Comments: Accepted by ICASSP2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[60]  arXiv:2302.13750 [pdf, other]
Title: MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition
Comments: Accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[61]  arXiv:2302.14036 [pdf, other]
Title: Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator
Comments: Accepted to INTERSPEECH 2023
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[62]  arXiv:2302.14120 [pdf, other]
Title: Diagonal State Space Augmented Transformers for Speech Recognition
Comments: to be presented at ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[63]  arXiv:2302.14572 [pdf, other]
Title: Training sound event detection with soft labels from crowdsourced annotations
Comments: ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64]  arXiv:2302.14638 [pdf, other]
Title: SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing
Comments: 14 pages, 7 figures, 14 tables, TASLP 2023 paper
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[65]  arXiv:2302.14748 [pdf, other]
Title: Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement
Comments: 5 pages, 2 figures, Accepted to Interspeech 20223
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[66]  arXiv:2302.14815 [pdf, other]
Title: Incremental Learning of Acoustic Scenes and Sound Events
Comments: Accepted to DCASE2023 Workshop
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[67]  arXiv:2302.05309 (cross-list from eess.SP) [pdf, other]
Title: The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization
Comments: 7 pages, 7 figures, Accepted to ICRA 2024
Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68]  arXiv:2302.07203 (cross-list from eess.IV) [pdf, other]
Title: Synthesizing audio from tongue motion during speech using tagged MRI via transformer
Comments: SPIE Medical Imaging: Deep Dive Oral
Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[69]  arXiv:2302.13854 (cross-list from eess.SP) [pdf, other]
Title: A Deep Neural Network Based Reverse Radio Spectrogram Search Algorithm
Comments: 8 pages, 8 figures
Journal-ref: RAS Techniques and Instruments 2023
Subjects: Signal Processing (eess.SP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70]  arXiv:2302.00286 (cross-list from cs.SD) [pdf, other]
Title: Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training
Comments: arXiv admin note: text overlap with arXiv:2206.10805
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71]  arXiv:2302.00646 (cross-list from cs.SD) [pdf, other]
Title: Epic-Sounds: A Large-scale Dataset of Actions That Sound
Comments: 6 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72]  arXiv:2302.00765 (cross-list from cs.CL) [pdf, other]
Title: Visually Grounded Keyword Detection and Localisation for Low-Resource Languages
Comments: PhD dissertation, University of Stellenbosch, 108 pages, submitted and accepted 2023
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73]  arXiv:2302.00836 (cross-list from cs.CL) [pdf, other]
Title: Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
Comments: The 13th International Symposium on Chinese Spoken Language Processing (ISCSLP 2022)
Journal-ref: Published in ISCSLP 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2302.00868 (cross-list from cs.SD) [pdf, other]
Title: Speech Enhancement for Virtual Meetings on Cellular Networks
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[75]  arXiv:2302.01090 (cross-list from cs.SD) [pdf, other]
Title: Goniometers are a Powerful Acoustic Feature for Music Information Retrieval Tasks
Authors: Tim Ziemer
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[ total of 182 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 176-182 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help  (Access key information)