Audio and Speech Processing

Authors and titles for eess.AS in Sep 2023

[ total of 465 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 451-465 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:2309.00169 [pdf, other]: Title: RepCodec: A Speech Representation Codec for Speech Tokenization

Authors: Zhichao Huang, Chutong Meng, Tom Ko

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2] arXiv:2309.00223 [pdf, other]: Title: The FruitShell French synthesis system at the Blizzard 2023 Challenge

Authors: Xin Qi, Xiaopeng Wang, Zhiyong Wang, Wang Liu, Mingming Ding, Shuchen Shi

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[3] arXiv:2309.00376 [pdf, other]: Title: Remixing-based Unsupervised Source Separation from Scratch

Authors: Kohei Saijo, Tetsuji Ogawa

Comments: Interspeech2023, 5pages, 2figures, 2tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[4] arXiv:2309.00424 [pdf, other]: Title: Learning Speech Representation From Contrastive Token-Acoustic Pretraining

Authors: Chunyu Qiang, Hao Li, Yixin Tian, Ruibo Fu, Tao Wang, Longbiao Wang, Jianwu Dang

Comments: Accepted by ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Sound (cs.SD)
[5] arXiv:2309.00647 [pdf, other]: Title: Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Authors: Seunghan Yang, Byeonggeun Kim, Kyuhong Shim, Simyung Chang

Comments: Interspeech 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[6] arXiv:2309.01108 [pdf, other]: Title: Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?

Authors: Sarthak Kumar Maharana, Krishna Kamal Adidam, Shoumik Nandi, Ajitesh Srivastava

Comments: Accepted to IEEE ICASSP Workshops 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[7] arXiv:2309.01142 [pdf, other]: Title: MSM-VC: High-fidelity Source Style Transfer for Non-Parallel Voice Conversion by Multi-scale Style Modeling

Authors: Zhichao Wang, Xinsheng Wang, Qicong Xie, Tao Li, Lei Xie, Qiao Tian, Yuping Wang

Comments: This work was submitted on April 10, 2022 and accepted on August 29, 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2309.01164 [pdf, other]: Title: Noise robust speech emotion recognition with signal-to-noise ratio adapting speech enhancement

Authors: Yu-Wen Chen, Julia Hirschberg, Yu Tsao

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[9] arXiv:2309.01513 [pdf, other]: Title: RGI-Net: 3D Room Geometry Inference from Room Impulse Responses in the Absence of First-order Echoes

Authors: Inmo Yeon, Jung-Woo Choi

Comments: 5 pages, 3 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[10] arXiv:2309.01535 [pdf, other]: Title: Single-Channel Speech Enhancement with Deep Complex U-Networks and Probabilistic Latent Space Models

Authors: Eike J. Nustede, Jörn Anemüller

Journal-ref: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023, pp. 1-5

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11] arXiv:2309.02265 [pdf, other]: Title: PESTO: Pitch Estimation with Self-supervised Transposition-equivariant Objective

Authors: Alain Riou, Stefan Lattner, Gaëtan Hadjeres, Geoffroy Peeters

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12] arXiv:2309.02285 [pdf, other]: Title: PromptTTS 2: Describing and Generating Voices with Text Prompt

Authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

Comments: Demo page: this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2309.02393 [pdf, other]: Title: In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms

Authors: Philipp Schilk, Niccolò Polvani, Andrea Ronco, Milos Cernak, Michele Magno

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[14] arXiv:2309.02418 [pdf, other]: Title: Personalized Adaptation with Pre-trained Speech Encoders for Continuous Emotion Recognition

Authors: Minh Tran, Yufeng Yin, Mohammad Soleymani

Comments: Accepted by INTERSPEECH 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[15] arXiv:2309.02432 [pdf, other]: Title: Employing Real Training Data for Deep Noise Suppression

Authors: Ziyi Xu, Marvin Sach, Jan Pirklbauer, Tim Fingscheidt

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16] arXiv:2309.02466 [pdf, ps, other]: Title: Minimal Effective Theory for Phonotactic Memory: Capturing Local Correlations due to Errors in Speech

Authors: Paul Myles Eugenio

Comments: 16 pages; 7 figs

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[17] arXiv:2309.02539 [pdf, other]: Title: A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation

Authors: Karn N. Watcharasupat, Chih-Wei Wu, Yiwei Ding, Iroro Orife, Aaron J. Hipple, Phillip A. Williams, Scott Kramer, Alexander Lerch, William Wolcott

Comments: Accepted to the IEEE Open Journal of Signal Processing (ICASSP 2024 Track)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[18] arXiv:2309.02567 [pdf, other]: Title: Symbolic Music Representations for Classification Tasks: A Systematic Evaluation

Authors: Huan Zhang, Emmanouil Karystinaios, Simon Dixon, Gerhard Widmer, Carlos Eduardo Cancino-Chacón

Comments: To be published in the Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

Journal-ref: Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD)
[19] arXiv:2309.02592 [pdf, other]: Title: BWSNet: Automatic Perceptual Assessment of Audio Signals

Authors: Clément Le Moine Veillon, Victor Rosi, Pablo Arias Sarah, Léane Salais, Nicolas Obin

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[20] arXiv:2309.02730 [pdf, other]: Title: Stylebook: Content-Dependent Speaking Style Modeling for Any-to-Any Voice Conversion using Only Speech Data

Authors: Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser

Comments: 5 pages, 2 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[21] arXiv:2309.02743 [pdf, other]: Title: MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023

Authors: Zhihang Xu, Shaofei Zhang, Xi Wang, Jiajun Zhang, Wenning Wei, Lei He, Sheng Zhao

Comments: 6 pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:2309.03019 [pdf, other]: Title: Leveraging ASR Pretrained Conformers for Speaker Verification through Transfer Learning and Knowledge Distillation

Authors: Danwei Cai, Ming Li

Subjects: Audio and Speech Processing (eess.AS)
[23] arXiv:2309.03149 [pdf, other]: Title: Real-time auralization for performers on virtual stages

Authors: Ernesto Accolti, Lukas Aspöck, Manuj Yadav, Michael Vorländer

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24] arXiv:2309.03199 [pdf, other]: Title: Matcha-TTS: A fast TTS architecture with conditional flow matching

Authors: Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, Gustav Eje Henter

Comments: 5 pages, 3 figures. Final version, accepted to IEEE ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD)
[25] arXiv:2309.03337 [pdf, other]: Title: Leveraging Geometrical Acoustic Simulations of Spatial Room Impulse Responses for Improved Sound Event Detection and Localization

Authors: Christopher Ick, Brian McFee

Comments: 5 pages, 3 figures, 3 tables, presented in the Proceedings of the 8th Detection and Classification of Acoustic Scenes and Events 2023 Workshop (DCASE2023)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

[ total of 465 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 451-465 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Sep 2023