We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Aug 2022

[ total of 149 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 126-149 ]
[ showing 25 entries per page: fewer | more | all ]
[1]  arXiv:2208.00987 [pdf, other]
Title: DENT-DDSP: Data-efficient noisy speech generator using differentiable digital signal processors for explicit distortion modelling and noise-robust speech recognition
Authors: Z. Guo, C. Chen, E.S. Chng
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[2]  arXiv:2208.01041 [pdf, ps, other]
Title: Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review
Comments: 41 pages, 7 figures, 4 tables
Subjects: Audio and Speech Processing (eess.AS); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD)
[3]  arXiv:2208.01555 [pdf, other]
Title: Low-complexity CNNs for Acoustic Scene Classification
Comments: Technical Report DCASE 2022 TASK 1. arXiv admin note: substantial text overlap with arXiv:2207.11529
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[4]  arXiv:2208.02189 [pdf, other]
Title: A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Comments: Accepted by INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[5]  arXiv:2208.02406 [pdf, ps, other]
Title: Domestic Activity Clustering from Audio via Depthwise Separable Convolutional Autoencoder Network
Comments: 6 pages, 5 figures, 4 tables. Accepted by IEEE MMSP 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[6]  arXiv:2208.02778 [pdf, other]
Title: Attention and DCT based Global Context Modeling for Text-independent Speaker Recognition
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7]  arXiv:2208.03023 [pdf, other]
Title: AID: Open-source Anechoic Interferer Dataset
Comments: Accepted for publication at IWAENC 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2208.03421 [pdf, other]
Title: SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:2208.04101 [pdf, other]
Title: FRA-RIR: Fast Random Approximation of the Image-source Method
Authors: Yi Luo, Jianwei Yu
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[10]  arXiv:2208.04622 [pdf, other]
Title: An Anchor-Free Detector for Continuous Speech Keyword Spotting
Comments: Accepted by Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[11]  arXiv:2208.04626 [pdf, ps, other]
Title: Recycling an anechoic pre-trained speech separation deep neural network for binaural dereverberation of a single source
Comments: 15 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[12]  arXiv:2208.04654 [pdf, other]
Title: Extending GCC-PHAT using Shift Equivariant Neural Networks
Comments: Proceedings of INTERSPEECH
Journal-ref: Proc. Interspeech 2022, 1791-1795
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[13]  arXiv:2208.05122 [pdf, other]
Title: Improving Hypernasality Estimation with Automatic Speech Recognition in Cleft Palate Speech
Comments: Accepted by InterSpeech 2022
Subjects: Audio and Speech Processing (eess.AS)
[14]  arXiv:2208.05184 [pdf, ps, other]
Title: Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source
Comments: 25 pages, 7 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[15]  arXiv:2208.05413 [pdf, other]
Title: Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations
Comments: Accepted at Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[16]  arXiv:2208.05445 [pdf, other]
Title: Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech
Comments: EARLY ACCESS of IEEE JSTSP Special Issue on Self-Supervised Learning for Speech and Audio Processing
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[17]  arXiv:2208.05735 [pdf, other]
Title: Chewing Detection from Commercial Smart-glasses
Comments: 6 pages, 4 figures, 1 table, conference
Journal-ref: Proceedings of the 7th International Workshop on Multimedia Proceedings of the 7th International Workshop on Multimedia Assisted Dietary Management (MADiMa '22), October 10, 2022, Lisboa, Portugal
Subjects: Audio and Speech Processing (eess.AS)
[18]  arXiv:2208.05782 [pdf, other]
Title: Comparison and Analysis of New Curriculum Criteria for End-to-End ASR
Comments: 5 pages, 2 figures, in Proceedings Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[19]  arXiv:2208.05830 [pdf, other]
Title: Speech Enhancement and Dereverberation with Diffusion-based Generative Models
Comments: Accepted version
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[20]  arXiv:2208.07282 [pdf, other]
Title: Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-To-End Audio Style Transfer
Comments: A revised version of this work has been accepted to the 154th AES Convention. To cite this work, please refer to the AES manuscript available at this https URL ; 12 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[21]  arXiv:2208.07446 [pdf, other]
Title: C3-DINO: Joint Contrastive and Non-contrastive Self-Supervised Learning for Speaker Verification
Comments: Accepted to IEEE Journal of Selected Topics in Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[22]  arXiv:2208.07657 [pdf, other]
Title: Uconv-Conformer: High Reduction of Input Sequence Length for End-to-End Speech Recognition
Comments: 5 pages, 1 figure, accepted by ICASSP 2023
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[23]  arXiv:2208.08012 [pdf, other]
Title: Disentangled Speaker Representation Learning via Mutual Information Minimization
Comments: Accepted by APSIPA ASC 2022. Camera-ready. 8 pages, 4 figures, and 1 table
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24]  arXiv:2208.08757 [pdf, other]
Title: Speech Representation Disentanglement with Adversarial Mutual Information Learning for One-shot Voice Conversion
Comments: 5 pages,5 figures,INTERSPEECH 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[25]  arXiv:2208.09775 [pdf, other]
Title: Visualising Model Training via Vowel Space for Text-To-Speech Systems
Comments: Accepted to Interspeech 2022
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[ total of 149 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 126-149 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2404, contact, help  (Access key information)