Audio and Speech Processing

Authors and titles for eess.AS in Aug 2022

[ total of 149 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 126-149 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:2208.00987 [pdf, other]: Title: DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition

Authors: Z. Guo, C. Chen, E.S. Chng

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2] arXiv:2208.01041 [pdf, ps, other]: Title: Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review

Authors: Arushi, Roberto Dillon, Ai Ni Teoh, Denise Dillon

Comments: 41 pages, 7 figures, 4 tables

Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[3] arXiv:2208.01555 [pdf, other]: Title: Low-complexity CNNs for Acoustic Scene Classification

Authors: Arshdeep Singh, James A King, Xubo Liu, Wenwu Wang, Mark D. Plumbley

Comments: Technical Report DCASE 2022 TASK 1. arXiv admin note: substantial text overlap with arXiv:2207.11529

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[4] arXiv:2208.02189 [pdf, other]: Title: A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

Authors: Qibing Bai, Tom Ko, Yu Zhang

Comments: Accepted by INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[5] arXiv:2208.02406 [pdf, ps, other]: Title: Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network

Authors: Yanxiong Li, Wenchang Cao, Konstantinos Drossos, Tuomas Virtanen

Comments: 6 pages, 5 figures, 4 tables. Accepted by IEEE MMSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[6] arXiv:2208.02778 [pdf, other]: Title: Attention and DCT based Global Context Modeling for Text-independent Speaker Recognition

Authors: Wei Xia, John H.L. Hansen

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2208.03023 [pdf, other]: Title: AID: Open-source Anechoic Interferer Dataset

Authors: Philipp Götz, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets

Comments: Accepted for publication at IWAENC 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2208.03421 [pdf, other]: Title: SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring

Authors: Jisheng Bai, Jianfeng Chen, Mou Wang, Muhammad Saad Ayub, Qingli Yan

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2208.04101 [pdf, other]: Title: FRA-RIR: Fast Random Approximation of the Image-source Method

Authors: Yi Luo, Jianwei Yu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[10] arXiv:2208.04622 [pdf, other]: Title: An Anchor-Free Detector for Continuous Speech Keyword Spotting

Authors: Zhiyuan Zhao, Chuanxin Tang, Chengdong Yao, Chong Luo

Comments: Accepted by Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11] arXiv:2208.04626 [pdf, ps, other]: Title: Recycling an anechoic pre-trained speech separation deep neural network for binaural dereverberation of a single source

Authors: Sania Gul, Muhammad Salman Khan, Syed Waqar Shah, Ata Ur-Rehman

Comments: 15 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12] arXiv:2208.04654 [pdf, other]: Title: Extending GCC-PHAT using Shift Equivariant Neural Networks

Authors: Axel Berg, Mark O'Connor, Kalle Åström, Magnus Oskarsson

Comments: Proceedings of INTERSPEECH

Journal-ref: Proc. Interspeech 2022, 1791-1795

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2208.05122 [pdf, other]: Title: Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech

Authors: Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Liping Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu

Comments: Accepted by InterSpeech 2022

Subjects: Audio and Speech Processing (eess.AS)
[14] arXiv:2208.05184 [pdf, ps, other]: Title: Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source

Authors: Sania Gul, Muhammad Salman Khan, Syed Waqar Shah

Comments: 25 pages, 7 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15] arXiv:2208.05413 [pdf, other]: Title: Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations

Authors: Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak

Comments: Accepted at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[16] arXiv:2208.05445 [pdf, other]: Title: Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech

Authors: Jaejin Cho, Jes'us Villalba, Laureano Moro-Velazquez, Najim Dehak

Comments: EARLY ACCESS of IEEE JSTSP Special Issue on Self-Supervised Learning for Speech and Audio Processing

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[17] arXiv:2208.05735 [pdf, other]: Title: Chewing Detection from Commercial Smart-glasses

Authors: Vasileios Papapanagiotou, Anastasia Liapi, Anastasios Delopoulos

Comments: 6 pages, 4 figures, 1 table, conference

Journal-ref: Proceedings of the 7th International Workshop on Multimedia Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management (MADiMa '22), October 10, 2022, Lisboa, Portugal

Subjects: Audio and Speech Processing (eess.AS)
[18] arXiv:2208.05782 [pdf, other]: Title: Comparison and Analysis of New Curriculum Criteria for End-to-End ASR

Authors: Georgios Karakasidis, Tamás Grósz, Mikko Kurimo

Comments: 5 pages, 2 figures, in Proceedings Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[19] arXiv:2208.05830 [pdf, other]: Title: Speech Enhancement and Dereverberation with Diffusion-based Generative Models

Authors: Julius Richter, Simon Welker, Jean-Marie Lemercier, Bunlong Lay, Timo Gerkmann

Comments: Accepted version

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[20] arXiv:2208.07282 [pdf, other]: Title: Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer

Authors: Shahan Nercessian

Comments: A revised version of this work has been accepted to the 154th AES Convention. To cite this work, please refer to the AES manuscript available at this https URL ; 12 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[21] arXiv:2208.07446 [pdf, other]: Title: C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification

Authors: Chunlei Zhang, Dong Yu

Comments: Accepted to IEEE Journal of Selected Topics in Signal Processing

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22] arXiv:2208.07657 [pdf, other]: Title: Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition

Authors: Andrei Andrusenko, Rauf Nasretdinov, Aleksei Romanenko

Comments: 5 pages, 1 figure, accepted by ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[23] arXiv:2208.08012 [pdf, other]: Title: Disentangled Speaker Representation Learning via Mutual Information Minimization

Authors: Sung Hwan Mun, Min Hyun Han, Minchan Kim, Dongjune Lee, Nam Soo Kim

Comments: Accepted by APSIPA ASC 2022. Camera-ready. 8 pages, 4 figures, and 1 table

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24] arXiv:2208.08757 [pdf, other]: Title: Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion

Authors: SiCheng Yang, Methawee Tantrawenith, Haolin Zhuang, Zhiyong Wu, Aolan Sun, Jianzong Wang, Ning Cheng, Huaizhen Tang, Xintao Zhao, Jie Wang, Helen Meng

Comments: 5 pages,5 figures,INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[25] arXiv:2208.09775 [pdf, other]: Title: Visualising Model Training via Vowel Space for Text-To-Speech Systems

Authors: Binu Abeysinghe, Jesin James, Catherine I. Watson, Felix Marattukalam

Comments: Accepted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

[ total of 149 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 126-149 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2404, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Aug 2022