Audio and Speech Processing

Authors and titles for eess.AS in Apr 2020

[ total of 132 entries: 1-132 ]
[ showing 132 entries per page: fewer | more ]

[1] arXiv:2004.00001 [pdf, other]: Title: VaPar Synth -- A Variational Parametric Model for Audio Synthesis

Authors: Krishna Subramani, Preeti Rao, Alexandre D'Hooge

Comments: this https URL , Accepted in ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2] arXiv:2004.00175 [pdf, other]: Title: Improved Source Counting and Separation for Monaural Mixture

Authors: Yiming Xiao, Haijian Zhang

Subjects: Audio and Speech Processing (eess.AS)
[3] arXiv:2004.00200 [pdf, other]: Title: On The Differences Between Song and Speech Emotion Recognition: Effect of Feature Sets, Feature Types, and Classifiers

Authors: Bagus Tris Atmaja, Masato Akagi

Comments: 2 Figures, 2 Tables

Journal-ref: 2020 IEEE REGION 10 CONFERENCE (TENCON), 968-972

Subjects: Audio and Speech Processing (eess.AS)
[4] arXiv:2004.00526 [pdf, other]: Title: Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms

Authors: Jee-weon Jung, Seung-bin Kim, Hye-jin Shim, Ju-ho Kim, Ha-Jin Yu

Comments: 5 pages, 1 figure, 5 tables, submitted to Interspeech 2020 as a conference paper

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[5] arXiv:2004.00910 [pdf, other]: Title: Improving auditory attention decoding performance of linear and non-linear methods using state-space model

Authors: Ali Aroudi, Tobias de Taillez, Simon Doclo

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[6] arXiv:2004.00932 [pdf, other]: Title: iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning

Authors: Haoyu Li, Szu-Wei Fu, Yu Tsao, Junichi Yamagishi

Comments: 5 pages, Submitted to INTERSPEECH 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7] arXiv:2004.00960 [pdf, other]: Title: The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment

Authors: Wei Zhou, Wilfried Michel, Kazuki Irie, Markus Kitza, Ralf Schlüter, Hermann Ney

Comments: accepted at ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8] arXiv:2004.00967 [pdf, other]: Title: Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model

Authors: Wei Zhou, Ralf Schlüter, Hermann Ney

Comments: accepted at ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9] arXiv:2004.01221 [pdf, other]: Title: Towards Relevance and Sequence Modeling in Language Recognition

Authors: Bharat Padi, Anand Mohan, Sriram Ganapathy

Comments: this https URL Accepted to IEEE Transactions on Audio, Speech and Language Processing

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[10] arXiv:2004.01275 [pdf, other]: Title: AI4COVID-19: AI Enabled Preliminary Diagnosis for COVID-19 from Cough Samples via an App

Authors: Ali Imran, Iryna Posokhova, Haneya N. Qureshi, Usama Masood, Muhammad Sajid Riaz, Kamran Ali, Charles N. John, MD Iftikhar Hussain, Muhammad Nabeel

Comments: Accepted in Informatics in Medicine Unlocked 2020

Journal-ref: Informatics in Medicine Unlocked, vol. 20, p. 100378, 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[11] arXiv:2004.01495 [pdf, other]: Title: Can Machine Learning Be Used to Recognize and Diagnose Coughs?

Authors: Charles Bales, Muhammad Nabeel, Charles N. John, Usama Masood, Haneya N. Qureshi, Hasan Farooq, Iryna Posokhova, Ali Imran

Comments: Accepted in IEEE International Conference on E-Health and Bioengineering - EHB 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[12] arXiv:2004.01525 [pdf, ps, other]: Title: Towards democratizing music production with AI-Design of Variational Autoencoder-based Rhythm Generator as a DAW plugin

Authors: Nao Tokui

Comments: 4 pages

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[13] arXiv:2004.01546 [pdf, other]: Title: Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection

Authors: Tharindu Fernando, Sridha Sridharan, Mitchell McLaren, Darshana Priyasad, Simon Denman, Clinton Fookes

Journal-ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[14] arXiv:2004.01559 [pdf, other]: Title: Neural i-vectors

Authors: Ville Vestman, Kong Aik Lee, Tomi H. Kinnunen

Comments: Accepted to Odyssey 2020: The Speaker and Language Recognition Workshop. Version 2 (bugfix)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[15] arXiv:2004.01922 [pdf, other]: Title: Subband modeling for spoofing detection in automatic speaker verification

Authors: Bhusan Chettri, Tomi Kinnunen, Emmanouil Benetos

Comments: Accepted to the Speaker Odyssey (The Speaker and Language Recognition Workshop) 2020 conference. 8 pages

Subjects: Audio and Speech Processing (eess.AS)
[16] arXiv:2004.02191 [pdf, other]: Title: Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model

Authors: Xin Wang, Junichi Yamagishi

Comments: Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS)
[17] arXiv:2004.02355 [pdf, other]: Title: Deep Multilayer Perceptrons for Dimensional Speech Emotion Recognition

Authors: Bagus Tris Atmaja, Masato Akagi

Comments: 2 figures, 4 tables, submitted to EUSIPCO 2020

Journal-ref: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020

Subjects: Audio and Speech Processing (eess.AS)
[18] arXiv:2004.02420 [pdf, other]: Title: Simultaneous Denoising and Dereverberation Using Deep Embedding Features

Authors: Cunhang Fan, Jianhua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[19] arXiv:2004.02450 [pdf, other]: Title: A bio-inspired geometric model for sound reconstruction

Authors: Ugo Boscain (LJLL (UMR\_7598), CNRS, CaGE ), Dario Prandi (CNRS, L2S), Ludovic Sacchelli (LAGEPP), Giuseppina Turco (CNRS, LLF UMR7110)

Subjects: Audio and Speech Processing (eess.AS); Analysis of PDEs (math.AP); Optimization and Control (math.OC); Neurons and Cognition (q-bio.NC)
[20] arXiv:2004.02541 [pdf, other]: Title: Vocoder-Based Speech Synthesis from Silent Videos

Authors: Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper Jensen

Comments: Accepted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[21] arXiv:2004.02863 [pdf, other]: Title: Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

Authors: Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

Comments: Accepted to Interspeech 2020. The codes are available at this https URL

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[22] arXiv:2004.03194 [pdf, other]: Title: Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

Authors: Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim

Comments: Accepted to Interspeech 2020

Journal-ref: Proc. Interspeech 2020, pp. 1501-1505

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[23] arXiv:2004.03428 [pdf, other]: Title: Universal Adversarial Perturbations Generative Network for Speaker Recognition

Authors: Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

Comments: Accepted by ICME2020

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[24] arXiv:2004.03434 [pdf, other]: Title: Learning to fool the speaker recognition

Authors: Jiguo Li, Xinfeng Zhang, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

Comments: Accepted by ICASSP2020

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[25] arXiv:2004.03437 [pdf, other]: Title: Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition

Authors: Yi Zheng, Xianjie Yang, Xuyong Dang

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[26] arXiv:2004.03512 [pdf, other]: Title: SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement

Authors: Robert Rehr, Timo Gerkmann

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 29, 2021. (c) 2021 IEEE

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[27] arXiv:2004.03586 [pdf, other]: Title: From Artificial Neural Networks to Deep Learning for Music Generation -- History, Concepts and Trends

Authors: Jean-Pierre Briot

Comments: To appear in the Special Issue on Art, Sound and Design in the Neural Computing and Applications Journal

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Machine Learning (stat.ML)
[28] arXiv:2004.03781 [pdf, other]: Title: Emotional Voice Conversion With Cycle-consistent Adversarial Network

Authors: Songxiang Liu, Yuewen Cao, Helen Meng

Comments: 5 pages

Subjects: Audio and Speech Processing (eess.AS)
[29] arXiv:2004.03782 [pdf, other]: Title: Multi-Target Emotional Voice Conversion With Neural Vocoders

Authors: Songxiang Liu, Yuewen Cao, Helen Meng

Comments: 7 pages

Subjects: Audio and Speech Processing (eess.AS)
[30] arXiv:2004.04001 [pdf, other]: Title: Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement

Authors: Haoyu Li, Junichi Yamagishi

Comments: 5 pages, Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS)
[31] arXiv:2004.04014 [pdf, other]: Title: Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

Authors: Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Comments: Accepted by Speaker Odyssey 2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32] arXiv:2004.04040 [pdf, other]: Title: Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music

Authors: Yifu Sun, Xulong Zhang, Yi Yu, Xi Chen, Wei Li

Comments: Accepted by CSMT (The 9th Conference on Sound and Music Technology)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[33] arXiv:2004.04054 [pdf, other]: Title: Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

Authors: A. Biswas, F. de Wet, E. van der Westhuizen, T.R. Niesler

Comments: 4th Code-Switch workshop, France

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[34] arXiv:2004.04072 [pdf, ps, other]: Title: CNN-MoE based framework for classification of respiratory anomalies and lung disease detection

Authors: Lam Pham, Huy Phan, Ramaswamy Palaniappan, Alfred Mertins, Ian McLoughlin

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[35] arXiv:2004.04095 [pdf, other]: Title: Deep Normalization for Speaker Vectors

Authors: Yunqi Cai, Lantian Li, Dong Wang, Andrew Abel

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[36] arXiv:2004.04096 [pdf, ps, other]: Title: Probabilistic embeddings for speaker diarization

Authors: Anna Silnova, Niko Brümmer, Johan Rohdin, Themos Stafylakis, Lukáš Burget

Comments: Awarded: Jack Godfrey Best Student Paper Award, at Odyssey 2020: The Speaker and Language Recognition Workshop, Tokio

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[37] arXiv:2004.04098 [pdf, other]: Title: WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement

Authors: Tsun-An Hsieh, Hsin-Min Wang, Xugang Lu, Yu Tsao

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[38] arXiv:2004.04099 [pdf, ps, other]: Title: Keywords Extraction and Sentiment Analysis using Automatic Speech Recognition

Authors: Rachit Shukla

Comments: 23 pages, 20 figures. Based on the work done as a part of the Science Academies' Summer Research Fellowship Programme (SRFP '19) at Vij\~na Labs

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[39] arXiv:2004.04290 [pdf, other]: Title: An investigation of phone-based subword units for end-to-end speech recognition

Authors: Weiran Wang, Guangsen Wang, Aadyot Bhatnagar, Yingbo Zhou, Caiming Xiong, Richard Socher

Comments: Interspeech 2020 final version. Implementation for reproducing the results can be found at: this https URL

Subjects: Audio and Speech Processing (eess.AS)
[40] arXiv:2004.04371 [pdf, other]: Title: MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

Authors: Xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Comments: Accepted by IJCNN2022 (The 2022 International Joint Conference on Neural Networks)

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[41] arXiv:2004.04410 [pdf, other]: Title: Att-HACK: An Expressive Speech Database with Social Attitudes

Authors: Clément Le Moine, Nicolas Obin

Comments: 5 pages, 5 figures

Subjects: Audio and Speech Processing (eess.AS)
[42] arXiv:2004.04459 [pdf, ps, other]: Title: Fast frequency discrimination and phoneme recognition using a biomimetic membrane coupled to a neural network

Authors: Woo Seok Lee, Hyunjae Kim, Andrew N. Cleland, Kang-Hun Ahn

Comments: 7 pages, 4 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Biological Physics (physics.bio-ph)
[43] arXiv:2004.04731 [pdf, other]: Title: Advancing Speech Synthesis using EEG

Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik

Comments: Under review

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[44] arXiv:2004.05274 [pdf, other]: Title: Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

Authors: Yu-An Chung, James Glass

Comments: Accepted to ACL 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[45] arXiv:2004.05830 [pdf, other]: Title: From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech

Authors: Hyeong-Seok Choi, Changdae Park, Kyogu Lee

Comments: 18 pages, 12 figures, Published as a conference paper at International Conference on Learning Representations (ICLR) 2020. (camera-ready version)

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[46] arXiv:2004.05989 [pdf, other]: Title: Data augmentation using generative networks to identify dementia

Authors: Bahman Mirheidari, Yilin Pan, Daniel Blackburn, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Heidi Christensen

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[47] arXiv:2004.06332 [src]: Title: Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment

Authors: Chao Ma, Dongmei Li, Xupeng Jia

Comments: This paper has been rejectted by INTERSPEECH 2020. It has been modified extensively and submitted to APSIPA ASC 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[48] arXiv:2004.06338 [pdf, ps, other]: Title: Transformer based Grapheme-to-Phoneme Conversion

Authors: Sevinj Yolchuyeva, Géza Németh, Bálint Gyires-Tóth

Comments: INTERSPEECH 2019

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[49] arXiv:2004.06422 [pdf, other]: Title: An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification

Authors: Hemlata Tak, Jose Patino, Andreas Nautsch, Nicholas Evans, Massimiliano Todisco

Comments: Accepted to Speaker Odyssey (The Speaker and Language Recognition Workshop), 2020, 8 pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50] arXiv:2004.06480 [pdf, other]: Title: Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech

Authors: N. Wilkinson, A. Biswas, E. Yılmaz, F. de Wet, E. van der Westhuizen, T.R. Niesler

Comments: SLTU 2020. arXiv admin note: text overlap with arXiv:2003.03135

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[51] arXiv:2004.06579 [pdf, other]: Title: The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device research

Authors: Florian Denk, Birger Kollmeier

Comments: 14 pages, 13 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[52] arXiv:2004.06756 [pdf, other]: Title: Speaker Diarization with Lexical Information

Authors: Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

Journal-ref: Interspeech 2019, 391-395

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[53] arXiv:2004.06833 [pdf, ps, other]: Title: Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge

Authors: Saturnino Luz, Fasih Haider, Sofia de la Fuente, Davida Fromm, Brian MacWhinney

Comments: To appear in the Proceedings of INTERSPEECH 2020, Oct 2020, Shanghai, China

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Machine Learning (stat.ML)
[54] arXiv:2004.07370 [pdf, other]: Title: F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

Authors: Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[55] arXiv:2004.07832 [pdf, other]: Title: Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders

Authors: Yang Ai, Zhen-Hua Ling

Comments: Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56] arXiv:2004.07948 [pdf, other]: Title: Sound of Guns: Digital Forensics of Gun Audio Samples meets Artificial Intelligence

Authors: Simone Raponi, Isra Ali, Gabriele Oligeri

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[57] arXiv:2004.07992 [pdf, other]: Title: Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network

Authors: Mariana Rodrigues Makiuchi, Tifani Warnita, Nakamasa Inoue, Koichi Shinoda, Michitaka Yoshimura, Momoko Kitazawa, Kei Funaki, Yoko Eguchi, Taishiro Kishimoto

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[58] arXiv:2004.08248 [pdf, ps, other]: Title: Acoustical classification of different speech acts using nonlinear methods

Authors: Chirayata Bhattacharyya, Sourya Sengupta, Sayan Nag, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Comments: 6 pages, 2 figures; Proceedings of WESPAC 2018, New Delhi, India, November 11-15, 2018

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Chaotic Dynamics (nlin.CD); Neurons and Cognition (q-bio.NC)
[59] arXiv:2004.08250 [pdf, other]: Title: How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition

Authors: George Sterpu, Christian Saam, Naomi Harte

Comments: in IEEE/ACM Transactions on Audio, Speech, and Language Processing (to appear)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[60] arXiv:2004.08287 [pdf, other]: Title: Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning

Authors: Jyotibdha Acharya, Arindam Basu

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[61] arXiv:2004.08326 [pdf, other]: Title: SpEx: Multi-Scale Time Domain Speaker Extraction Network

Authors: Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

Comments: ACCEPTED in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[62] arXiv:2004.08531 [pdf, other]: Title: MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition

Authors: Somshubra Majumdar, Boris Ginsburg

Subjects: Audio and Speech Processing (eess.AS)
[63] arXiv:2004.08849 [pdf, other]: Title: The Attacker's Perspective on Automatic Speaker Verification: An Overview

Authors: Rohan Kumar Das, Xiaohai Tian, Tomi Kinnunen, Haizhou Li

Comments: 5 pages, 1 figure, Submitted to Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR)
[64] arXiv:2004.09347 [pdf, other]: Title: End-to-End Whisper to Natural Speech Conversion using Modified Transformer Network

Authors: Abhishek Niranjan, Mukesh Sharma, Sai Bharath Chandra Gutha, M Ali Basha Shaik

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[65] arXiv:2004.09571 [pdf, other]: Title: Language-agnostic Multilingual Modeling

Authors: Arindrima Datta, Bhuvana Ramabhadran, Jesse Emond, Anjuli Kannan, Brian Roark

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[66] arXiv:2004.09584 [pdf, other]: Title: ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric

Authors: Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman, Andrew Hines

Comments: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[67] arXiv:2004.09607 [pdf, other]: Title: Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System

Authors: Viet Lam Phung, Phan Huy Kinh, Anh Tuan Dinh, Quoc Bao Nguyen

Comments: 8 pages, 2 figures, submit to Oriental Cocosda

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[68] arXiv:2004.10120 [pdf, other]: Title: Vector Quantized Contrastive Predictive Coding for Template-based Music Generation

Authors: Gaëtan Hadjeres, Léopold Crestel

Comments: 15 pages, 13 figures

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[69] arXiv:2004.10246 [pdf, ps, other]: Title: Music Generation with Temporal Structure Augmentation

Authors: Shakeel Raja

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[70] arXiv:2004.10391 [pdf, other]: Title: Towards Linking the Lakh and IMSLP Datasets

Authors: TJ Tsai

Comments: 5 pages, 4 figures, 1 table. Accepted paper at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020

Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Image and Video Processing (eess.IV)
[71] arXiv:2004.10799 [pdf, other]: Title: Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription

Authors: Andrei Andrusenko, Aleksandr Laptev, Ivan Medennikov

Comments: Accepted by Interspeech 2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[72] arXiv:2004.10823 [pdf, other]: Title: Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit

Authors: Tomoki Koriyama, Hiroshi Saruwatari

Comments: 5 pages. Accepted by ICASSP2020

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[73] arXiv:2004.11012 [pdf, other]: Title: ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

Authors: Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma

Comments: Accepted by ISCSLP2021

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[74] arXiv:2004.11162 [pdf, other]: Title: Flexible framework for audio reconstruction

Authors: Ondřej Mokrý, Pavel Rajmic, Pavel Záviška

Journal-ref: 23rd International Conference on Digital Audio Effects (eDAFx2020)

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75] arXiv:2004.11284 [pdf, other]: Title: Unsupervised Speech Decomposition via Triple Information Bottleneck

Authors: Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[76] arXiv:2004.11544 [pdf, other]: Title: Towards Fast and Accurate Streaming End-to-End ASR

Authors: Bo Li, Shuo-yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui Wu

Comments: Accepted in ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS)
[77] arXiv:2004.11956 [pdf, other]: Title: Binaural Audio Source Remixing with Microphone Array Listening Devices

Authors: Ryan M. Corey, Andrew C. Singer

Comments: To appear at ICASSP 2020

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[78] arXiv:2004.12046 [pdf, ps, other]: Title: Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-occurrence

Authors: Keisuke Imoto, Seisuke Kyochi

Comments: Accepted to IEICE Transactions on Information and Systems

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79] arXiv:2004.12071 [pdf, other]: Title: Active Voice Authentication

Authors: Zhong Meng, M Umair Bin Altaf, Biing-Hwang (Fred) Juang

Comments: 39 pages, 4 figures

Journal-ref: Digital Signal Processing, Volume 101, June 2020, 102672, ISSN 1051-2004

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[80] arXiv:2004.12261 [pdf, other]: Title: Enabling Fast and Universal Audio Adversarial Attack Using Generative Model

Authors: Yi Xie, Zhuohang Li, Cong Shi, Jian Liu, Yingying Chen, Bo Yuan

Comments: Publish on AAAI21

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[81] arXiv:2004.12745 [pdf, other]: Title: Time-Frequency Analysis and Parameterisation of Knee Sounds for Non-invasive Detection of Osteoarthritis

Authors: Costas Yiallourides, Patrick A. Naylor

Comments: Submitted to IEEE Transactions on Biomedical Engineering

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[82] arXiv:2004.13172 [pdf, other]: Title: Autoencoding Neural Networks as Musical Audio Synthesizers

Authors: Joseph Colonel, Christopher Curro, Sam Keene

Journal-ref: Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), 2018, pp40-44

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[83] arXiv:2004.13480 [pdf, other]: Title: L-Vector: Neural Label Embedding for Domain Adaptation

Authors: Zhong Meng, Hu Hu, Jinyu Li, Changliang Liu, Yan Huang, Yifan Gong, Chin-Hui Lee

Comments: 5 pages, 2 figure, ICASSP 2020

Journal-ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[84] arXiv:2004.13521 [pdf, ps, other]: Title: Detect Language of Transliterated Texts

Authors: Sourav Sen

Comments: 10 pages, 8 figures, 3 tables

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[85] arXiv:2004.13522 [pdf, ps, other]: Title: Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition

Authors: Li Fu, Xiaoxiao Li, Libo Zi

Comments: 5 pages, 3 figures

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[86] arXiv:2004.13670 [pdf, other]: Title: Neural Speech Separation Using Spatially Distributed Microphones

Authors: Dongmei Wang, Zhuo Chen, Takuya Yoshioka

Comments: 5 pages, 2 figures, Interspeech2020

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[87] arXiv:2004.13764 [pdf, other]: Title: Conditional Spoken Digit Generation with StyleGAN

Authors: Kasperi Palkama, Lauri Juvela, Alexander Ilin

Comments: Interspeech2020 accepted version

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[88] arXiv:2004.14091 [pdf, other]: Title: Determined BSS based on time-frequency masking and its application to harmonic vector analysis

Authors: Kohei Yatabe, Daichi Kitamura

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[89] arXiv:2004.14617 [pdf, other]: Title: CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech

Authors: Sri Karlapati, Alexis Moinet, Arnaud Joly, Viacheslav Klimkov, Daniel Sáez-Trigueros, Thomas Drugman

Journal-ref: INTERSPEECH 2020: 4387-4391

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[90] arXiv:2004.14762 [pdf, other]: Title: Time-domain speaker extraction network

Authors: Chenglin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

Comments: Published in ASRU 2019. arXiv admin note: text overlap with arXiv:2004.08326

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[91] arXiv:2004.14832 [pdf, other]: Title: A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications

Authors: Deepak Baby, Arthur Van Den Broucke, Sarah Verhulst

Subjects: Audio and Speech Processing (eess.AS); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Sound (cs.SD)
[92] arXiv:2004.14840 [pdf, other]: Title: Multiresolution and Multimodal Speech Recognition with Transformers

Authors: Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram

Comments: Accepted for ACL 2020

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[93] arXiv:2004.14859 [pdf, other]: Title: Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording Environments

Authors: Bhavik Vachhani, Chitralekha Bhat, Sunil Kopparapu

Comments: 6 pages, 6 figures

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94] arXiv:2004.03712 (cross-list from eess.SP) [pdf, other]: Title: Heart Sound Segmentation using Bidirectional LSTMs with Attention

Authors: Tharindu Fernando, Houman Ghaemmaghami, Simon Denman, Sridha Sridharan, Nayyar Hussain, Clinton Fookes

Comments: IEEE Journal of Biomedical and Health Informatics, 25 October 2019

Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[95] arXiv:2004.03926 (cross-list from eess.SP) [pdf, other]: Title: MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source Extraction

Authors: Robin Scheibler, Nobutaka Ono

Comments: 15 pages, 4 figures

Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2004.00132 (cross-list from cs.SD) [pdf, other]: Title: AM-MobileNet1D: A Portable Model for Speaker Recognition

Authors: João Antônio Chagas Nunes, David Macêdo, Cleber Zanchettin

Comments: 2020 International Joint Conference on Neural Networks (IJCNN)

Journal-ref: 2020 International Joint Conference on Neural Networks (IJCNN)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[97] arXiv:2004.01023 (cross-list from cs.MM) [pdf, other]: Title: Multi-Modal Video Forensic Platform for Investigating Post-Terrorist Attack Scenarios

Authors: Alexander Schindler, Andrew Lindley, Anahid Jalali, Martin Boyer, Sergiu Gordea, Ross King

Journal-ref: In Proceedings of the 11th ACM Multimedia Systems Conference (MMSys2020), June 06-11, 2020, Istanbul, Turkey

Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2004.02219 (cross-list from cs.CL) [pdf, other]: Title: Speaker Recognition using SincNet and X-Vector Fusion

Authors: Mayank Tripathi, Divyanshu Singh, Seba Susan

Comments: The 19th International Conference on Artificial Intelligence and Soft Computing

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2004.03413 (cross-list from cs.MM) [pdf, other]: Title: Direct Speech-to-image Translation

Authors: Jiguo Li, Xinfeng Zhang, Chuanmin Jia, Jizheng Xu, Li Zhang, Yue Wang, Siwei Ma, Wen Gao

Comments: Accepted by JSTSP

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2004.03873 (cross-list from cs.SD) [pdf, other]: Title: Conditioned Source Separation for Music Instrument Performances

Authors: Olga Slizovskaia, Gloria Haro, Emilia Gómez

Comments: 14 pages, 5 figures, under review

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[101] arXiv:2004.04662 (cross-list from cs.LG) [pdf, other]: Title: Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Authors: Andis Draguns, Emīls Ozoliņš, Agris Šostaks, Matīss Apinis, Kārlis Freivalds

Comments: 35th AAAI Conference on Artificial Intelligence (AAAI-21)

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102] arXiv:2004.04972 (cross-list from cs.CL) [pdf, other]: Title: Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

Authors: Soumi Maiti, Erik Marchi, Alistair Conkie

Comments: Accepted to IEEE ICASSP 2020

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103] arXiv:2004.05985 (cross-list from cs.CL) [pdf, ps, other]: Title: Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

Authors: Łukasz Augustyniak, Piotr Szymanski, Mikołaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim Dehak

Comments: submitted to INTERSPEECH'20

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104] arXiv:2004.07070 (cross-list from cs.CL) [pdf, other]: Title: Analyzing analytical methods: The case of phonology in neural models of spoken language

Authors: Grzegorz Chrupała, Bertrand Higy, Afra Alishahi

Comments: ACL 2020

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105] arXiv:2004.07171 (cross-list from cs.SD) [pdf, other]: Title: Musical Features for Automatic Music Transcription Evaluation

Authors: Adrien Ycart, Lele Liu, Emmanouil Benetos, Marcus T. Pearce

Comments: Technical report

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[106] arXiv:2004.07301 (cross-list from cs.CV) [pdf, other]: Title: ESResNet: Environmental Sound Classification Based on Visual Domain Models

Authors: Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

Comments: 8 pages, 4 figures; submitted to ICPR 2020

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107] arXiv:2004.07442 (cross-list from cs.CR) [pdf, other]: Title: Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release

Authors: Yaowei Han, Sheng Li, Yang Cao, Qiang Ma, Masatoshi Yoshikawa

Comments: The paper has been accepted by the IEEE International Conference on Multimedia & Expo 2020(ICME 2020)

Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108] arXiv:2004.07800 (cross-list from cs.HC) [pdf, other]: Title: Leveraging GANs to Improve Continuous Path Keyboard Input Models

Authors: Akash Mehra, Jerome R. Bellegarda, Ojas Bapat, Partha Lal, Xin Wang

Subjects: Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[109] arXiv:2004.07820 (cross-list from cs.SD) [pdf, ps, other]: Title: Speaker Recognition in Bengali Language from Nonlinear Features

Authors: Uddalok Sarkar, Soumyadeep Pal, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak Ghosh

Comments: arXiv admin note: text overlap with arXiv:1612.00171, arXiv:1601.07709

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[110] arXiv:2004.08269 (cross-list from cs.SD) [pdf, other]: Title: Beat Detection and Automatic Annotation of the Music of Bharatanatyam Dance using Speech Recognition Techniques

Authors: Tanwi Mallick, Partha Pratim Das, Arun Kumar Majumdar

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111] arXiv:2004.09249 (cross-list from cs.SD) [pdf, other]: Title: CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Authors: Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville Ryant

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[112] arXiv:2004.09476 (cross-list from cs.CV) [pdf, other]: Title: Music Gesture for Visual Sound Separation

Authors: Chuang Gan, Deng Huang, Hang Zhao, Joshua B. Tenenbaum, Antonio Torralba

Comments: CVPR 2020. Project page: this http URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113] arXiv:2004.10087 (cross-list from cs.CL) [pdf, other]: Title: AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling

Authors: Libo Qin, Xiao Xu, Wanxiang Che, Ting Liu

Comments: Accepted at Findings of EMNLP 2020. Data and code are available at this [URL] (this https URL)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114] arXiv:2004.10093 (cross-list from cs.CL) [pdf, other]: Title: Curriculum Pre-training for End-to-End Speech Translation

Authors: Chengyi Wang, Yu Wu, Shujie Liu, Ming Zhou, Zhenglu Yang

Comments: accepted by ACL2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115] arXiv:2004.10234 (cross-list from cs.CL) [pdf, ps, other]: Title: ESPnet-ST: All-in-One Speech Translation Toolkit

Authors: Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji Watanabe

Comments: Accepted at ACL 2020 System Demonstration (update Table1, fix typo)

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116] arXiv:2004.10345 (cross-list from cs.MM) [pdf, other]: Title: MIDI-Sheet Music Alignment Using Bootleg Score Synthesis

Authors: Thitaree Tanprasert, Teerapat Jenrungrot, Meinard Mueller, T.J. Tsai

Comments: 8 pages, 6 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[117] arXiv:2004.10347 (cross-list from cs.MM) [pdf, other]: Title: MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music

Authors: Daniel Yang, Thitaree Tanprasert, Teerapat Jenrungrot, Mengyi Shan, TJ Tsai

Comments: 8 pages, 8 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[118] arXiv:2004.10454 (cross-list from cs.CL) [pdf, other]: Title: A Study of Non-autoregressive Model for Sequence Generation

Authors: Yi Ren, Jinglin Liu, Xu Tan, Zhou Zhao, Sheng Zhao, Tie-Yan Liu

Comments: Accepted by ACL 2020

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119] arXiv:2004.11419 (cross-list from cs.SD) [pdf, other]: Title: End-to-end speech-to-dialog-act recognition

Authors: Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[120] arXiv:2004.11724 (cross-list from cs.MM) [pdf, other]: Title: Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages

Authors: TJ Tsai, Daniel Yang, Mengyi Shan, Thitaree Tanprasert, Teerapat Jenrungrot

Comments: 13 pages, 8 figures, 3 tables. Accepted article in IEEE Transactions on Multimedia. arXiv admin note: text overlap with arXiv:2004.10347

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[121] arXiv:2004.12031 (cross-list from cs.LG) [pdf, ps, other]: Title: On the Role of Visual Cues in Audiovisual Speech Enhancement

Authors: Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen Abdelaziz

Comments: ICASSP 2021

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122] arXiv:2004.12111 (cross-list from cs.SD) [pdf, ps, other]: Title: Jointly Trained Transformers models for Spoken Language Translation

Authors: Hari Krishna Vydana, Martin Karafi'at, Katerina Zmolikova, Luk'as Burget, Honza Cernocky

Comments: 7-pages,3 figures

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[123] arXiv:2004.12200 (cross-list from cs.SD) [pdf, other]: Title: Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting

Authors: Menglong Xu, Xiao-Lei Zhang

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124] arXiv:2004.12569 (cross-list from cs.MM) [pdf, ps, other]: Title: DWT-GBT-SVD-based Robust Speech Steganography

Authors: Noshin Amiri, Iman Naderi

Comments: 10 pages, 4 Figures

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125] arXiv:2004.13007 (cross-list from cs.IR) [pdf, ps, other]: Title: A session-based song recommendation approach involving user characterization along the play power-law distribution

Authors: Diego Sánchez-Moreno, Vivian F. López Batista, M. Dolores Muñoz Vicente, Ana B. Gil González, María N. Moreno-García

Comments: Accepted in Complexity (ISSN: 1099-0526)

Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126] arXiv:2004.13595 (cross-list from cs.SD) [pdf, other]: Title: Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise

Authors: Shan Yang, Yuxuan Wang, Lei Xie

Comments: submitted to IEEE SPL

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127] arXiv:2004.13780 (cross-list from cs.CV) [pdf, other]: Title: Cross-modal Speaker Verification and Recognition: A Multilingual Perspective

Authors: Muhammad Saad Saeed, Shah Nawaz, Pietro Morerio, Arif Mahmood, Ignazio Gallo, Muhammad Haroon Yousaf, Alessio Del Bue

Comments: Accepted: CVPRW

Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128] arXiv:2004.14228 (cross-list from cs.CL) [pdf, other]: Title: Meta-Transfer Learning for Code-Switched Speech Recognition

Authors: Genta Indra Winata, Samuel Cahyawijaya, Zhaojiang Lin, Zihan Liu, Peng Xu, Pascale Fung

Comments: Accepted in ACL 2020. The first two authors contributed equally to this work

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129] arXiv:2004.14326 (cross-list from cs.SD) [pdf, other]: Title: Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

Authors: Soo-Whan Chung, Hong Goo Kang, Joon Son Chung

Comments: Under submission as a conference paper

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[130] arXiv:2004.14368 (cross-list from cs.CV) [pdf, other]: Title: VGGSound: A Large-scale Audio-Visual Dataset

Authors: Honglie Chen, Weidi Xie, Andrea Vedaldi, Andrew Zisserman

Comments: ICASSP2020

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131] arXiv:2004.14846 (cross-list from cs.CL) [pdf, other]: Title: The role of context in neural pitch accent detection in English

Authors: Elizabeth Nielsen, Mark Steedman, Sharon Goldwater

Journal-ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132] arXiv:2004.14858 (cross-list from cs.MM) [pdf, other]: Title: MuSe 2020 -- The First International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop

Authors: Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis Kompatsiaris

Comments: Baseline Paper MuSe 2020, MuSe Workshop Challenge, ACM Multimedia

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 132 entries: 1-132 ]
[ showing 132 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2404, contact, help (Access key information)

> eess > eess.AS

Audio and Speech Processing

Authors and titles for eess.AS in Apr 2020