Audio and Speech Processing

Authors and titles for eess.AS in Sep 2023, skipping first 75

[ total of 465 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 451-465 ]
[ showing 25 entries per page: fewer | more | all ]

[76] arXiv:2309.08157 [pdf, other]: Title: RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function

Authors: Pengyu Wang, Xiaofei Li

Comments: Submitted to ICASSP2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[77] arXiv:2309.08255 [pdf, other]: Title: Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

Authors: Dariusz Piotrowski, Renard Korzeniowski, Alessio Falai, Sebastian Cygert, Kamil Pokora, Georgi Tinchev, Ziyao Zhang, Kayoko Yanagisawa

Comments: Accepted at ICONIP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[78] arXiv:2309.08263 [pdf, other]: Title: Improving Voice Conversion for Dissimilar Speakers Using Perceptual Losses

Authors: Suhita Ghosh, Yamini Sinha, Ingo Siegert, Sebastian Stober

Comments: Accepted in The German Annual Conference on Acoustics 2023 (DAGA)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79] arXiv:2309.08279 [pdf, other]: Title: Improving Short Utterance Anti-Spoofing with AASIST2

Authors: Yuxiang Zhang, Jingze Lu, Zengqiang Shang, Wenchao Wang, Pengyuan Zhang

Comments: 5 pages, 2 figures, accepted by ICASSP

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[80] arXiv:2309.08285 [pdf, other]: Title: One-Class Knowledge Distillation for Spoofing Speech Detection

Authors: Jingze Lu, Yuxiang Zhang, Wenchao Wang, Zengqiang Shang, Pengyuan Zhang

Comments: submitted to icassp 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[81] arXiv:2309.08290 [pdf, other]: Title: Head-Related Transfer Function Interpolation with a Spherical CNN

Authors: Xingyu Chen, Fei Ma, Yile Zhang, Amy Bastine, Prasanga N. Samarasinghe

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[82] arXiv:2309.08294 [pdf, other]: Title: Speech-dependent Modeling of Own Voice Transfer Characteristics for In-ear Microphones in Hearables

Authors: Mattes Ohlenbusch, Christian Rollwage, Simon Doclo

Comments: Presented at Forum Acusticum 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[83] arXiv:2309.08295 [pdf, other]: Title: A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

Authors: Ilya Gurvich, Ido Leichter, Dharmendar Reddy Palle, Yossi Asher, Alon Vinnikov, Igor Abramovski, Vishak Gopal, Ross Cutler, Eyal Krupka

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[84] arXiv:2309.08320 [pdf, other]: Title: Diff-SV: A Unified Hierarchical Framework for Noise-Robust Speaker Verification Using Score-Based Diffusion Probabilistic Models

Authors: Ju-ho Kim, Jungwoo Heo, Hyun-seo Shin, Chan-yeong Lim, Ha-Jin Yu

Comments: 5 pages, 2 figures, accepted for ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[85] arXiv:2309.08348 [pdf, other]: Title: The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Authors: Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao

Comments: 5 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[86] arXiv:2309.08355 [pdf, other]: Title: Semi-supervised Sound Event Detection with Local and Global Consistency Regularization

Authors: Yiming Li, Xiangdong Wang, Hong Liu, Rui Tao, Long Yan, Kazushige Ouchi

Comments: submitted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[87] arXiv:2309.08357 [pdf, other]: Title: Audio-free Prompt Tuning for Language-Audio Models

Authors: Yiming Li, Xiangdong Wang, Hong Liu

Comments: submitted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[88] arXiv:2309.08377 [pdf, other]: Title: DiaCorrect: Error Correction Back-end For Speaker Diarization

Authors: Jiangyu Han, Federico Landini, Johan Rohdin, Mireia Diez, Lukas Burget, Yuhang Cao, Heng Lu, Jan Cernocky

Comments: Submitted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[89] arXiv:2309.08436 [pdf, other]: Title: Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

Authors: Mohammad Zeineldeen, Albert Zeyer, Ralf Schlüter, Hermann Ney

Comments: Accepted at ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[90] arXiv:2309.08454 [pdf, other]: Title: Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition

Authors: Peter Vieting, Simon Berger, Thilo von Neumann, Christoph Boeddeker, Ralf Schlüter, Reinhold Haeb-Umbach

Comments: Submitted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[91] arXiv:2309.08489 [pdf, other]: Title: Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network

Authors: Yiling Huang, Weiran Wang, Guanlong Zhao, Hank Liao, Wei Xia, Quan Wang

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[92] arXiv:2309.08561 [pdf, other]: Title: Open-vocabulary Keyword-spotting with Adaptive Instance Normalization

Authors: Aviv Navon, Aviv Shamsian, Neta Glazer, Gill Hetz, Joseph Keshet

Comments: Under Review

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[93] arXiv:2309.08684 [pdf, other]: Title: Music Source Separation Based on a Lightweight Deep Learning Framework (DTTNET: DUAL-PATH TFC-TDF UNET)

Authors: Junyu Chen, Susmitha Vekkot, Pancham Shukla

Comments: Accepted for ICASSP 2024. Additional experiments can be found in the published version on IEEE Xplore

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94] arXiv:2309.08730 [pdf, other]: Title: MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

Authors: Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos

Journal-ref: 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Sound (cs.SD)
[95] arXiv:2309.08804 [pdf, other]: Title: Stack-and-Delay: a new codebook pattern for music generation

Authors: Gael Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest Iandola, Vikas Chandra

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[96] arXiv:2309.08828 [pdf, other]: Title: Boosting End-to-End Multilingual Phoneme Recognition through Exploiting Universal Speech Attributes Constraints

Authors: Hao Yen, Sabato Marco Siniscalchi, Chin-Hui Lee

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[97] arXiv:2309.08876 [pdf, ps, other]: Title: Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

Authors: Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[98] arXiv:2309.09028 [pdf, other]: Title: Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions

Authors: Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu

Comments: Paper in submission

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[99] arXiv:2309.09180 [pdf, other]: Title: Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

Authors: Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

Comments: Accepted by ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[100] arXiv:2309.09220 [pdf, other]: Title: Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

Authors: Ahmed Adel Attia, Yashish M. Siriwardena, Carol Espy-Wilson

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)

[ total of 465 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | ... | 451-465 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2406, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Sep 2023, skipping first 75