Audio and Speech Processing

Authors and titles for eess.AS in Apr 2020, skipping first 50

[ total of 132 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-132 ]
[ showing 25 entries per page: fewer | more | all ]

[51] arXiv:2004.06579 [pdf, other]: Title: The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device research

Authors: Florian Denk, Birger Kollmeier

Comments: 14 pages, 13 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[52] arXiv:2004.06756 [pdf, other]: Title: Speaker Diarization with Lexical Information

Authors: Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

Journal-ref: Interspeech 2019, 391-395

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[53] arXiv:2004.06833 [pdf, ps, other]: Title: Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

Authors: Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney

Comments: To appear in the Proceedings of INTERSPEECH 2020, Oct 2020, Shanghai, China

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Machine Learning (stat.ML)
[54] arXiv:2004.07370 [pdf, other]: Title: F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

Authors: Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[55] arXiv:2004.07832 [pdf, other]: Title: Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Authors: Yang Ai, Zhen-Hua Ling

Comments: Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56] arXiv:2004.07948 [pdf, other]: Title: Sound of Guns: Digital Forensics of Gun Audio Samples meets Artificial Intelligence

Authors: Simone Raponi, Isra Ali, Gabriele Oligeri

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[57] arXiv:2004.07992 [pdf, other]: Title: Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

Authors: Mariana Rodrigues Makiuchi, Tifani Warnita, Nakamasa Inoue, Koichi Shinoda, Michitaka Yoshimura, Momoko Kitazawa, Kei Funaki, Yoko Eguchi, Taishiro Kishimoto

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[58] arXiv:2004.08248 [pdf, ps, other]: Title: Acoustical classification of different speech acts using nonlinear methods

Authors: Chirayata Bhattacharyya, Sourya Sengupta, Sayan Nag, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Comments: 6 pages, 2 figures; Proceedings of WESPAC 2018, New Delhi, India, November 11-15, 2018

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Chaotic Dynamics (nlin.CD); Neurons and Cognition (q-bio.NC)
[59] arXiv:2004.08250 [pdf, other]: Title: How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

Authors: George Sterpu, Christian Saam, Naomi Harte

Comments: in IEEE/ACM Transactions on Audio, Speech, and Language Processing (to appear)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[60] arXiv:2004.08287 [pdf, other]: Title: Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning

Authors: Jyotibdha Acharya, Arindam Basu

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[61] arXiv:2004.08326 [pdf, other]: Title: SpEx: Multi-Scale Time Domain Speaker Extraction Network

Authors: Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

Comments: ACCEPTED in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[62] arXiv:2004.08531 [pdf, other]: Title: MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition

Authors: Somshubra Majumdar, Boris Ginsburg

Subjects: Audio and Speech Processing (eess.AS)
[63] arXiv:2004.08849 [pdf, other]: Title: The Attacker's Perspective on Automatic Speaker Verification: An Overview

Authors: Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li

Comments: 5 pages, 1 figure, Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR)
[64] arXiv:2004.09347 [pdf, other]: Title: End-to-End Whisper to Natural Speech Conversion using Modified Transformer Network

Authors: Abhishek Niranjan, Mukesh Sharma, Sai Bharath Chandra Gutha, M Ali Basha Shaik

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[65] arXiv:2004.09571 [pdf, other]: Title: Language-agnostic Multilingual Modeling

Authors: Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, Brian Roark

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[66] arXiv:2004.09584 [pdf, other]: Title: ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

Authors: Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman, Andrew Hines

Comments: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[67] arXiv:2004.09607 [pdf, other]: Title: Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System

Authors: Viet Lam Phung, Phan Huy Kinh, Anh Tuan Dinh, Quoc Bao Nguyen

Comments: 8 pages, 2 figures, submit to Oriental Cocosda

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[68] arXiv:2004.10120 [pdf, other]: Title: Vector Quantized Contrastive Predictive Coding for Template-based Music Generation

Authors: Gaëtan Hadjeres, Léopold Crestel

Comments: 15 pages, 13 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[69] arXiv:2004.10246 [pdf, ps, other]: Title: Music Generation with Temporal Structure Augmentation

Authors: Shakeel Raja

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[70] arXiv:2004.10391 [pdf, other]: Title: Towards Linking the Lakh and IMSLP Datasets

Authors: TJ Tsai

Comments: 5 pages, 4 figures, 1 table. Accepted paper at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Image and Video Processing (eess.IV)
[71] arXiv:2004.10799 [pdf, other]: Title: Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

Authors: Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov

Comments: Accepted by Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[72] arXiv:2004.10823 [pdf, other]: Title: Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

Authors: Tomoki Koriyama, Hiroshi Saruwatari

Comments: 5 pages. Accepted by ICASSP2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[73] arXiv:2004.11012 [pdf, other]: Title: ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

Authors: Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma

Comments: Accepted by ISCSLP2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[74] arXiv:2004.11162 [pdf, other]: Title: Flexible framework for audio reconstruction

Authors: Ondřej Mokrý, Pavel Rajmic, Pavel Záviška

Journal-ref: 23rd International Conference on Digital Audio Effects (eDAFx2020)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75] arXiv:2004.11284 [pdf, other]: Title: Unsupervised Speech Decomposition via Triple Information Bottleneck

Authors: Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

[ total of 132 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-132 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2404, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Apr 2020, skipping first 50