Audio and Speech Processing

Authors and titles for eess.AS in Feb 2023, skipping first 50

[ total of 182 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 176-182 ]
[ showing 25 entries per page: fewer | more | all ]

[51] arXiv:2302.12369 [pdf, other]: Title: Factual Consistency Oriented Speech Recognition

Authors: Naoyuki Kanda, Takuya Yoshioka, Yang Liu

Comments: 5 pages, 1 figure, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[52] arXiv:2302.12391 [pdf, other]: Title: PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

Authors: Junhyeok Lee, Wonbin Jung, Hyunjae Cho, Jaeyeon Kim, Jaehwan Kim

Comments: 6 pages, preprint

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53] arXiv:2302.12757 [pdf, other]: Title: Ensemble knowledge distillation of self-supervised speech models

Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

Comments: Accepted by ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[54] arXiv:2302.13063 [pdf, other]: Title: Time-Variance Aware Real-Time Speech Enhancement

Authors: Chengyu Zheng, Yuan Zhou, Xiulian Peng, Yuan Zhang, Yan Lu

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[55] arXiv:2302.13209 [pdf, other]: Title: I-MSV 2022: Indic-Multilingual and Multi-sensor Speaker Verification Challenge

Authors: Jagabandhu Mishra, Mrinmoy Bhattacharjee, S. R. Mahadeva Prasanna

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56] arXiv:2302.13407 [pdf, other]: Title: DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement

Authors: Anton Kovalyov, Kashyap Patel, Issa Panahi

Comments: 5 pages, 1 figure, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57] arXiv:2302.13458 [pdf, other]: Title: Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow

Authors: Yoonhyung Lee, Jinhyeok Yang, Kyomin Jung

Comments: Accepted for ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[58] arXiv:2302.13527 [pdf, ps, other]: Title: Complex Clipping for Improved Generalization in Machine Learning

Authors: Les Atlas, Nicholas Rasmussen, Felix Schwock, Mert Pilanci

Comments: Submitted to IEEE Signal Processing Letters

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[59] arXiv:2302.13652 [pdf, ps, other]: Title: Duration-aware pause insertion using pre-trained language model for multi-speaker text-to-speech

Authors: Dong Yang, Tomoki Koriyama, Yuki Saito, Takaaki Saeki, Detai Xin, Hiroshi Saruwatari

Comments: Accepted by ICASSP2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[60] arXiv:2302.13750 [pdf, other]: Title: MoLE : Mixture of Language Experts for Multi-Lingual Automatic Speech Recognition

Authors: Yoohwan Kwon, Soo-Whan Chung

Comments: Accepted by ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[61] arXiv:2302.14036 [pdf, other]: Title: Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator

Authors: Vladimir Bataev, Roman Korostik, Evgeny Shabalin, Vitaly Lavrukhin, Boris Ginsburg

Comments: Accepted to INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[62] arXiv:2302.14120 [pdf, other]: Title: Diagonal State Space Augmented Transformers for Speech Recognition

Authors: George Saon, Ankit Gupta, Xiaodong Cui

Comments: to be presented at ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[63] arXiv:2302.14572 [pdf, other]: Title: Training sound event detection with soft labels from crowdsourced annotations

Authors: Irene Martín-Morató, Manu Harju, Paul Ahokas, Annamaria Mesaros

Comments: ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64] arXiv:2302.14638 [pdf, other]: Title: SpeechFormer++: A Hierarchical Efficient Framework for Paralinguistic Speech Processing

Authors: Weidong Chen, Xiaofen Xing, Xiangmin Xu, Jianxin Pang, Lan Du

Comments: 14 pages, 7 figures, 14 tables, TASLP 2023 paper

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[65] arXiv:2302.14748 [pdf, other]: Title: Reducing the Prior Mismatch of Stochastic Differential Equations for Diffusion-based Speech Enhancement

Authors: Bunlong Lay, Simon Welker, Julius Richter, Timo Gerkmann

Comments: 5 pages, 2 figures, Accepted to Interspeech 20223

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[66] arXiv:2302.14815 [pdf, other]: Title: Incremental Learning of Acoustic Scenes and Sound Events

Authors: Manjunath Mulimani, Annamaria Mesaros

Comments: Accepted to DCASE2023 Workshop

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[67] arXiv:2302.05309 (cross-list from eess.SP) [pdf, other]: Title: The LuViRA Dataset: Synchronized Vision, Radio, and Audio Sensors for Indoor Localization

Authors: Ilayda Yaman, Guoda Tian, Martin Larsson, Patrik Persson, Michiel Sandra, Alexander Dürr, Erik Tegler, Nikhil Challa, Henrik Garde, Fredrik Tufvesson, Kalle Åström, Ove Edfors, Steffen Malkowsky, Liang Liu

Comments: 7 pages, 7 figures, Accepted to ICRA 2024

Subjects: Signal Processing (eess.SP); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[68] arXiv:2302.07203 (cross-list from eess.IV) [pdf, other]: Title: Synthesizing audio from tongue motion during speech using tagged MRI via transformer

Authors: Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Maureen Stone, Georges El Fakhri, Jonghye Woo

Comments: SPIE Medical Imaging: Deep Dive Oral

Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[69] arXiv:2302.13854 (cross-list from eess.SP) [pdf, other]: Title: A Deep Neural Network Based Reverse Radio Spectrogram Search Algorithm

Authors: Peter Xiangyuan Ma, Steve Croft, Chris Lintott, Andrew P. V. Siemion

Comments: 8 pages, 8 figures

Journal-ref: RAS Techniques and Instruments 2023

Subjects: Signal Processing (eess.SP); Instrumentation and Methods for Astrophysics (astro-ph.IM); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70] arXiv:2302.00286 (cross-list from cs.SD) [pdf, other]: Title: Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint Training

Authors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien Herremans

Comments: arXiv admin note: text overlap with arXiv:2206.10805

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71] arXiv:2302.00646 (cross-list from cs.SD) [pdf, other]: Title: Epic-Sounds: A Large-scale Dataset of Actions That Sound

Authors: Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman

Comments: 6 pages, 4 figures

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[72] arXiv:2302.00765 (cross-list from cs.CL) [pdf, other]: Title: Visually Grounded Keyword Detection and Localisation for Low-Resource Languages

Authors: Kayode Kolawole Olaleye

Comments: PhD dissertation, University of Stellenbosch, 108 pages, submitted and accepted 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[73] arXiv:2302.00836 (cross-list from cs.CL) [pdf, other]: Title: Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

Authors: HoLam Chung, Junan Li, Pengfei Liu1, Wai-Kim Leung, Xixin Wu, Helen Meng

Comments: The 13th International Symposium on Chinese Spoken Language Processing (ISCSLP 2022)

Journal-ref: Published in ISCSLP 2022

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74] arXiv:2302.00868 (cross-list from cs.SD) [pdf, other]: Title: Speech Enhancement for Virtual Meetings on Cellular Networks

Authors: Hojeong Lee, Minseon Gwak, Kawon Lee, Minjeong Kim, Joseph Konan, Ojas Bhargave

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[75] arXiv:2302.01090 (cross-list from cs.SD) [pdf, other]: Title: Goniometers are a Powerful Acoustic Feature for Music Information Retrieval Tasks

Authors: Tim Ziemer

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)

[ total of 182 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | ... | 176-182 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Feb 2023, skipping first 50