We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Apr 2020

[ total of 132 entries: 1-132 ]
[ showing 132 entries per page: fewer | more ]
[1]  arXiv:2004.00001 [pdf, other]
Title: VaPar Synth -- A Variational Parametric Model for Audio Synthesis
Comments: this https URL , Accepted in ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[2]  arXiv:2004.00175 [pdf, other]
Title: Improved Source Counting and Separation for Monaural Mixture
Subjects: Audio and Speech Processing (eess.AS)
[3]  arXiv:2004.00200 [pdf, other]
Title: On The Differences Between Song and Speech Emotion Recognition: Effect of Feature Sets, Feature Types, and Classifiers
Comments: 2 Figures, 2 Tables
Journal-ref: 2020 IEEE REGION 10 CONFERENCE (TENCON), 968-972
Subjects: Audio and Speech Processing (eess.AS)
[4]  arXiv:2004.00526 [pdf, other]
Title: Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw Waveforms
Comments: 5 pages, 1 figure, 5 tables, submitted to Interspeech 2020 as a conference paper
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[5]  arXiv:2004.00910 [pdf, other]
Title: Improving auditory attention decoding performance of linear and non-linear methods using state-space model
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[6]  arXiv:2004.00932 [pdf, other]
Title: iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric Learning
Comments: 5 pages, Submitted to INTERSPEECH 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[7]  arXiv:2004.00960 [pdf, other]
Title: The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment
Comments: accepted at ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2004.00967 [pdf, other]
Title: Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language Model
Comments: accepted at ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:2004.01221 [pdf, other]
Title: Towards Relevance and Sequence Modeling in Language Recognition
Comments: this https URL Accepted to IEEE Transactions on Audio, Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[10]  arXiv:2004.01275 [pdf, other]
Title: AI4COVID-19: AI Enabled Preliminary Diagnosis for COVID-19 from Cough Samples via an App
Comments: Accepted in Informatics in Medicine Unlocked 2020
Journal-ref: Informatics in Medicine Unlocked, vol. 20, p. 100378, 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[11]  arXiv:2004.01495 [pdf, other]
Title: Can Machine Learning Be Used to Recognize and Diagnose Coughs?
Comments: Accepted in IEEE International Conference on E-Health and Bioengineering - EHB 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[12]  arXiv:2004.01525 [pdf, ps, other]
Title: Towards democratizing music production with AI-Design of Variational Autoencoder-based Rhythm Generator as a DAW plugin
Authors: Nao Tokui
Comments: 4 pages
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[13]  arXiv:2004.01546 [pdf, other]
Title: Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity Detection
Journal-ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[14]  arXiv:2004.01559 [pdf, other]
Title: Neural i-vectors
Comments: Accepted to Odyssey 2020: The Speaker and Language Recognition Workshop. Version 2 (bugfix)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[15]  arXiv:2004.01922 [pdf, other]
Title: Subband modeling for spoofing detection in automatic speaker verification
Comments: Accepted to the Speaker Odyssey (The Speaker and Language Recognition Workshop) 2020 conference. 8 pages
Subjects: Audio and Speech Processing (eess.AS)
[16]  arXiv:2004.02191 [pdf, other]
Title: Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform Model
Comments: Submitted to Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS)
[17]  arXiv:2004.02355 [pdf, other]
Title: Deep Multilayer Perceptrons for Dimensional Speech Emotion Recognition
Comments: 2 figures, 4 tables, submitted to EUSIPCO 2020
Journal-ref: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020
Subjects: Audio and Speech Processing (eess.AS)
[18]  arXiv:2004.02420 [pdf, other]
Title: Simultaneous Denoising and Dereverberation Using Deep Embedding Features
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[19]  arXiv:2004.02450 [pdf, other]
Title: A bio-inspired geometric model for sound reconstruction
Authors: Ugo Boscain (LJLL (UMR\_7598), CNRS, CaGE ), Dario Prandi (CNRS, L2S), Ludovic Sacchelli (LAGEPP), Giuseppina Turco (CNRS, LLF UMR7110)
Subjects: Audio and Speech Processing (eess.AS); Analysis of PDEs (math.AP); Optimization and Control (math.OC); Neurons and Cognition (q-bio.NC)
[20]  arXiv:2004.02541 [pdf, other]
Title: Vocoder-Based Speech Synthesis from Silent Videos
Comments: Accepted to Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
[21]  arXiv:2004.02863 [pdf, other]
Title: Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs
Comments: Accepted to Interspeech 2020. The codes are available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[22]  arXiv:2004.03194 [pdf, other]
Title: Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances
Comments: Accepted to Interspeech 2020
Journal-ref: Proc. Interspeech 2020, pp. 1501-1505
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[23]  arXiv:2004.03428 [pdf, other]
Title: Universal Adversarial Perturbations Generative Network for Speaker Recognition
Comments: Accepted by ICME2020
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[24]  arXiv:2004.03434 [pdf, other]
Title: Learning to fool the speaker recognition
Comments: Accepted by ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[25]  arXiv:2004.03437 [pdf, other]
Title: Homophone-based Label Smoothing in End-to-End Automatic Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[26]  arXiv:2004.03512 [pdf, other]
Title: SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 29, 2021. (c) 2021 IEEE
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[27]  arXiv:2004.03586 [pdf, other]
Title: From Artificial Neural Networks to Deep Learning for Music Generation -- History, Concepts and Trends
Comments: To appear in the Special Issue on Art, Sound and Design in the Neural Computing and Applications Journal
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Machine Learning (stat.ML)
[28]  arXiv:2004.03781 [pdf, other]
Title: Emotional Voice Conversion With Cycle-consistent Adversarial Network
Comments: 5 pages
Subjects: Audio and Speech Processing (eess.AS)
[29]  arXiv:2004.03782 [pdf, other]
Title: Multi-Target Emotional Voice Conversion With Neural Vocoders
Comments: 7 pages
Subjects: Audio and Speech Processing (eess.AS)
[30]  arXiv:2004.04001 [pdf, other]
Title: Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech Enhancement
Comments: 5 pages, Submitted to Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS)
[31]  arXiv:2004.04014 [pdf, other]
Title: Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification
Comments: Accepted by Speaker Odyssey 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[32]  arXiv:2004.04040 [pdf, other]
Title: Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music
Comments: Accepted by CSMT (The 9th Conference on Sound and Music Technology)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[33]  arXiv:2004.04054 [pdf, other]
Title: Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition
Comments: 4th Code-Switch workshop, France
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[34]  arXiv:2004.04072 [pdf, ps, other]
Title: CNN-MoE based framework for classification of respiratory anomalies and lung disease detection
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[35]  arXiv:2004.04095 [pdf, other]
Title: Deep Normalization for Speaker Vectors
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[36]  arXiv:2004.04096 [pdf, ps, other]
Title: Probabilistic embeddings for speaker diarization
Comments: Awarded: Jack Godfrey Best Student Paper Award, at Odyssey 2020: The Speaker and Language Recognition Workshop, Tokio
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[37]  arXiv:2004.04098 [pdf, other]
Title: WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech Enhancement
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[38]  arXiv:2004.04099 [pdf, ps, other]
Title: Keywords Extraction and Sentiment Analysis using Automatic Speech Recognition
Authors: Rachit Shukla
Comments: 23 pages, 20 figures. Based on the work done as a part of the Science Academies' Summer Research Fellowship Programme (SRFP '19) at Vij\~na Labs
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[39]  arXiv:2004.04290 [pdf, other]
Title: An investigation of phone-based subword units for end-to-end speech recognition
Comments: Interspeech 2020 final version. Implementation for reproducing the results can be found at: this https URL
Subjects: Audio and Speech Processing (eess.AS)
[40]  arXiv:2004.04371 [pdf, other]
Title: MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification
Comments: Accepted by IJCNN2022 (The 2022 International Joint Conference on Neural Networks)
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
[41]  arXiv:2004.04410 [pdf, other]
Title: Att-HACK: An Expressive Speech Database with Social Attitudes
Comments: 5 pages, 5 figures
Subjects: Audio and Speech Processing (eess.AS)
[42]  arXiv:2004.04459 [pdf, ps, other]
Title: Fast frequency discrimination and phoneme recognition using a biomimetic membrane coupled to a neural network
Comments: 7 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Biological Physics (physics.bio-ph)
[43]  arXiv:2004.04731 [pdf, other]
Title: Advancing Speech Synthesis using EEG
Comments: Under review
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[44]  arXiv:2004.05274 [pdf, other]
Title: Improved Speech Representations with Multi-Target Autoregressive Predictive Coding
Comments: Accepted to ACL 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[45]  arXiv:2004.05830 [pdf, other]
Title: From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech
Comments: 18 pages, 12 figures, Published as a conference paper at International Conference on Learning Representations (ICLR) 2020. (camera-ready version)
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[46]  arXiv:2004.05989 [pdf, other]
Title: Data augmentation using generative networks to identify dementia
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[47]  arXiv:2004.06332 [src]
Title: Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment
Comments: This paper has been rejectted by INTERSPEECH 2020. It has been modified extensively and submitted to APSIPA ASC 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[48]  arXiv:2004.06338 [pdf, ps, other]
Title: Transformer based Grapheme-to-Phoneme Conversion
Comments: INTERSPEECH 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[49]  arXiv:2004.06422 [pdf, other]
Title: An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification
Comments: Accepted to Speaker Odyssey (The Speaker and Language Recognition Workshop), 2020, 8 pages
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50]  arXiv:2004.06480 [pdf, other]
Title: Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech
Comments: SLTU 2020. arXiv admin note: text overlap with arXiv:2003.03135
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[51]  arXiv:2004.06579 [pdf, other]
Title: The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device research
Comments: 14 pages, 13 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[52]  arXiv:2004.06756 [pdf, other]
Title: Speaker Diarization with Lexical Information
Journal-ref: Interspeech 2019, 391-395
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[53]  arXiv:2004.06833 [pdf, ps, other]
Title: Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS Challenge
Comments: To appear in the Proceedings of INTERSPEECH 2020, Oct 2020, Shanghai, China
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Machine Learning (stat.ML)
[54]  arXiv:2004.07370 [pdf, other]
Title: F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[55]  arXiv:2004.07832 [pdf, other]
Title: Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural Vocoders
Comments: Submitted to Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[56]  arXiv:2004.07948 [pdf, other]
Title: Sound of Guns: Digital Forensics of Gun Audio Samples meets Artificial Intelligence
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[57]  arXiv:2004.07992 [pdf, other]
Title: Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural Network
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[58]  arXiv:2004.08248 [pdf, ps, other]
Title: Acoustical classification of different speech acts using nonlinear methods
Comments: 6 pages, 2 figures; Proceedings of WESPAC 2018, New Delhi, India, November 11-15, 2018
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Chaotic Dynamics (nlin.CD); Neurons and Cognition (q-bio.NC)
[59]  arXiv:2004.08250 [pdf, other]
Title: How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition
Comments: in IEEE/ACM Transactions on Audio, Speech, and Language Processing (to appear)
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[60]  arXiv:2004.08287 [pdf, other]
Title: Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model Tuning
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[61]  arXiv:2004.08326 [pdf, other]
Title: SpEx: Multi-Scale Time Domain Speaker Extraction Network
Comments: ACCEPTED in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[62]  arXiv:2004.08531 [pdf, other]
Title: MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition
Subjects: Audio and Speech Processing (eess.AS)
[63]  arXiv:2004.08849 [pdf, other]
Title: The Attacker's Perspective on Automatic Speaker Verification: An Overview
Comments: 5 pages, 1 figure, Submitted to Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR)
[64]  arXiv:2004.09347 [pdf, other]
Title: End-to-End Whisper to Natural Speech Conversion using Modified Transformer Network
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[65]  arXiv:2004.09571 [pdf, other]
Title: Language-agnostic Multilingual Modeling
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
[66]  arXiv:2004.09584 [pdf, other]
Title: ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric
Comments: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[67]  arXiv:2004.09607 [pdf, other]
Title: Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System
Comments: 8 pages, 2 figures, submit to Oriental Cocosda
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[68]  arXiv:2004.10120 [pdf, other]
Title: Vector Quantized Contrastive Predictive Coding for Template-based Music Generation
Comments: 15 pages, 13 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[69]  arXiv:2004.10246 [pdf, ps, other]
Title: Music Generation with Temporal Structure Augmentation
Authors: Shakeel Raja
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[70]  arXiv:2004.10391 [pdf, other]
Title: Towards Linking the Lakh and IMSLP Datasets
Authors: TJ Tsai
Comments: 5 pages, 4 figures, 1 table. Accepted paper at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020
Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Image and Video Processing (eess.IV)
[71]  arXiv:2004.10799 [pdf, other]
Title: Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party Transcription
Comments: Accepted by Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[72]  arXiv:2004.10823 [pdf, other]
Title: Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Comments: 5 pages. Accepted by ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[73]  arXiv:2004.11012 [pdf, other]
Title: ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
Comments: Accepted by ISCSLP2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[74]  arXiv:2004.11162 [pdf, other]
Title: Flexible framework for audio reconstruction
Journal-ref: 23rd International Conference on Digital Audio Effects (eDAFx2020)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[75]  arXiv:2004.11284 [pdf, other]
Title: Unsupervised Speech Decomposition via Triple Information Bottleneck
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[76]  arXiv:2004.11544 [pdf, other]
Title: Towards Fast and Accurate Streaming End-to-End ASR
Comments: Accepted in ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS)
[77]  arXiv:2004.11956 [pdf, other]
Title: Binaural Audio Source Remixing with Microphone Array Listening Devices
Comments: To appear at ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[78]  arXiv:2004.12046 [pdf, ps, other]
Title: Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-occurrence
Comments: Accepted to IEICE Transactions on Information and Systems
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79]  arXiv:2004.12071 [pdf, other]
Title: Active Voice Authentication
Comments: 39 pages, 4 figures
Journal-ref: Digital Signal Processing, Volume 101, June 2020, 102672, ISSN 1051-2004
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[80]  arXiv:2004.12261 [pdf, other]
Title: Enabling Fast and Universal Audio Adversarial Attack Using Generative Model
Comments: Publish on AAAI21
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[81]  arXiv:2004.12745 [pdf, other]
Title: Time-Frequency Analysis and Parameterisation of Knee Sounds for Non-invasive Detection of Osteoarthritis
Comments: Submitted to IEEE Transactions on Biomedical Engineering
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[82]  arXiv:2004.13172 [pdf, other]
Title: Autoencoding Neural Networks as Musical Audio Synthesizers
Journal-ref: Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), 2018, pp40-44
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[83]  arXiv:2004.13480 [pdf, other]
Title: L-Vector: Neural Label Embedding for Domain Adaptation
Comments: 5 pages, 2 figure, ICASSP 2020
Journal-ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[84]  arXiv:2004.13521 [pdf, ps, other]
Title: Detect Language of Transliterated Texts
Authors: Sourav Sen
Comments: 10 pages, 8 figures, 3 tables
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[85]  arXiv:2004.13522 [pdf, ps, other]
Title: Research on Modeling Units of Transformer Transducer for Mandarin Speech Recognition
Comments: 5 pages, 3 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[86]  arXiv:2004.13670 [pdf, other]
Title: Neural Speech Separation Using Spatially Distributed Microphones
Comments: 5 pages, 2 figures, Interspeech2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[87]  arXiv:2004.13764 [pdf, other]
Title: Conditional Spoken Digit Generation with StyleGAN
Comments: Interspeech2020 accepted version
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[88]  arXiv:2004.14091 [pdf, other]
Title: Determined BSS based on time-frequency masking and its application to harmonic vector analysis
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[89]  arXiv:2004.14617 [pdf, other]
Title: CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech
Journal-ref: INTERSPEECH 2020: 4387-4391
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[90]  arXiv:2004.14762 [pdf, other]
Title: Time-domain speaker extraction network
Comments: Published in ASRU 2019. arXiv admin note: text overlap with arXiv:2004.08326
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[91]  arXiv:2004.14832 [pdf, other]
Title: A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications
Subjects: Audio and Speech Processing (eess.AS); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Sound (cs.SD)
[92]  arXiv:2004.14840 [pdf, other]
Title: Multiresolution and Multimodal Speech Recognition with Transformers
Comments: Accepted for ACL 2020
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[93]  arXiv:2004.14859 [pdf, other]
Title: Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording Environments
Comments: 6 pages, 6 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94]  arXiv:2004.03712 (cross-list from eess.SP) [pdf, other]
Title: Heart Sound Segmentation using Bidirectional LSTMs with Attention
Comments: IEEE Journal of Biomedical and Health Informatics, 25 October 2019
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
[95]  arXiv:2004.03926 (cross-list from eess.SP) [pdf, other]
Title: MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source Extraction
Comments: 15 pages, 4 figures
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96]  arXiv:2004.00132 (cross-list from cs.SD) [pdf, other]
Title: AM-MobileNet1D: A Portable Model for Speaker Recognition
Comments: 2020 International Joint Conference on Neural Networks (IJCNN)
Journal-ref: 2020 International Joint Conference on Neural Networks (IJCNN)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[97]  arXiv:2004.01023 (cross-list from cs.MM) [pdf, other]
Title: Multi-Modal Video Forensic Platform for Investigating Post-Terrorist Attack Scenarios
Journal-ref: In Proceedings of the 11th ACM Multimedia Systems Conference (MMSys2020), June 06-11, 2020, Istanbul, Turkey
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98]  arXiv:2004.02219 (cross-list from cs.CL) [pdf, other]
Title: Speaker Recognition using SincNet and X-Vector Fusion
Comments: The 19th International Conference on Artificial Intelligence and Soft Computing
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99]  arXiv:2004.03413 (cross-list from cs.MM) [pdf, other]
Title: Direct Speech-to-image Translation
Comments: Accepted by JSTSP
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100]  arXiv:2004.03873 (cross-list from cs.SD) [pdf, other]
Title: Conditioned Source Separation for Music Instrument Performances
Comments: 14 pages, 5 figures, under review
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[101]  arXiv:2004.04662 (cross-list from cs.LG) [pdf, other]
Title: Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences
Comments: 35th AAAI Conference on Artificial Intelligence (AAAI-21)
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[102]  arXiv:2004.04972 (cross-list from cs.CL) [pdf, other]
Title: Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Comments: Accepted to IEEE ICASSP 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103]  arXiv:2004.05985 (cross-list from cs.CL) [pdf, ps, other]
Title: Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?
Comments: submitted to INTERSPEECH'20
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[104]  arXiv:2004.07070 (cross-list from cs.CL) [pdf, other]
Title: Analyzing analytical methods: The case of phonology in neural models of spoken language
Comments: ACL 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105]  arXiv:2004.07171 (cross-list from cs.SD) [pdf, other]
Title: Musical Features for Automatic Music Transcription Evaluation
Comments: Technical report
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[106]  arXiv:2004.07301 (cross-list from cs.CV) [pdf, other]
Title: ESResNet: Environmental Sound Classification Based on Visual Domain Models
Comments: 8 pages, 4 figures; submitted to ICPR 2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107]  arXiv:2004.07442 (cross-list from cs.CR) [pdf, other]
Title: Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data Release
Comments: The paper has been accepted by the IEEE International Conference on Multimedia & Expo 2020(ICME 2020)
Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108]  arXiv:2004.07800 (cross-list from cs.HC) [pdf, other]
Title: Leveraging GANs to Improve Continuous Path Keyboard Input Models
Subjects: Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[109]  arXiv:2004.07820 (cross-list from cs.SD) [pdf, ps, other]
Title: Speaker Recognition in Bengali Language from Nonlinear Features
Comments: arXiv admin note: text overlap with arXiv:1612.00171, arXiv:1601.07709
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[110]  arXiv:2004.08269 (cross-list from cs.SD) [pdf, other]
Title: Beat Detection and Automatic Annotation of the Music of Bharatanatyam Dance using Speech Recognition Techniques
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111]  arXiv:2004.09249 (cross-list from cs.SD) [pdf, other]
Title: CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[112]  arXiv:2004.09476 (cross-list from cs.CV) [pdf, other]
Title: Music Gesture for Visual Sound Separation
Comments: CVPR 2020. Project page: this http URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[113]  arXiv:2004.10087 (cross-list from cs.CL) [pdf, other]
Title: AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot Filling
Comments: Accepted at Findings of EMNLP 2020. Data and code are available at this [URL] (this https URL)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114]  arXiv:2004.10093 (cross-list from cs.CL) [pdf, other]
Title: Curriculum Pre-training for End-to-End Speech Translation
Comments: accepted by ACL2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115]  arXiv:2004.10234 (cross-list from cs.CL) [pdf, ps, other]
Title: ESPnet-ST: All-in-One Speech Translation Toolkit
Comments: Accepted at ACL 2020 System Demonstration (update Table1, fix typo)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116]  arXiv:2004.10345 (cross-list from cs.MM) [pdf, other]
Title: MIDI-Sheet Music Alignment Using Bootleg Score Synthesis
Comments: 8 pages, 6 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[117]  arXiv:2004.10347 (cross-list from cs.MM) [pdf, other]
Title: MIDI Passage Retrieval Using Cell Phone Pictures of Sheet Music
Comments: 8 pages, 8 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[118]  arXiv:2004.10454 (cross-list from cs.CL) [pdf, other]
Title: A Study of Non-autoregressive Model for Sequence Generation
Comments: Accepted by ACL 2020
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119]  arXiv:2004.11419 (cross-list from cs.SD) [pdf, other]
Title: End-to-end speech-to-dialog-act recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[120]  arXiv:2004.11724 (cross-list from cs.MM) [pdf, other]
Title: Using Cell Phone Pictures of Sheet Music To Retrieve MIDI Passages
Comments: 13 pages, 8 figures, 3 tables. Accepted article in IEEE Transactions on Multimedia. arXiv admin note: text overlap with arXiv:2004.10347
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[121]  arXiv:2004.12031 (cross-list from cs.LG) [pdf, ps, other]
Title: On the Role of Visual Cues in Audiovisual Speech Enhancement
Comments: ICASSP 2021
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122]  arXiv:2004.12111 (cross-list from cs.SD) [pdf, ps, other]
Title: Jointly Trained Transformers models for Spoken Language Translation
Comments: 7-pages,3 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[123]  arXiv:2004.12200 (cross-list from cs.SD) [pdf, other]
Title: Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword Spotting
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[124]  arXiv:2004.12569 (cross-list from cs.MM) [pdf, ps, other]
Title: DWT-GBT-SVD-based Robust Speech Steganography
Comments: 10 pages, 4 Figures
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2004.13007 (cross-list from cs.IR) [pdf, ps, other]
Title: A session-based song recommendation approach involving user characterization along the play power-law distribution
Comments: Accepted in Complexity (ISSN: 1099-0526)
Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126]  arXiv:2004.13595 (cross-list from cs.SD) [pdf, other]
Title: Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise
Comments: submitted to IEEE SPL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[127]  arXiv:2004.13780 (cross-list from cs.CV) [pdf, other]
Title: Cross-modal Speaker Verification and Recognition: A Multilingual Perspective
Comments: Accepted: CVPRW
Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128]  arXiv:2004.14228 (cross-list from cs.CL) [pdf, other]
Title: Meta-Transfer Learning for Code-Switched Speech Recognition
Comments: Accepted in ACL 2020. The first two authors contributed equally to this work
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129]  arXiv:2004.14326 (cross-list from cs.SD) [pdf, other]
Title: Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision
Comments: Under submission as a conference paper
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[130]  arXiv:2004.14368 (cross-list from cs.CV) [pdf, other]
Title: VGGSound: A Large-scale Audio-Visual Dataset
Comments: ICASSP2020
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131]  arXiv:2004.14846 (cross-list from cs.CL) [pdf, other]
Title: The role of context in neural pitch accent detection in English
Journal-ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132]  arXiv:2004.14858 (cross-list from cs.MM) [pdf, other]
Title: MuSe 2020 -- The First International Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop
Comments: Baseline Paper MuSe 2020, MuSe Workshop Challenge, ACM Multimedia
Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 132 entries: 1-132 ]
[ showing 132 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2404, contact, help  (Access key information)