Audio and Speech Processing
Authors and titles for eess.AS in Apr 2020
[ total of 132 entries: 1-132 ][ showing 132 entries per page: fewer | more ]
- [1] arXiv:2004.00001 [pdf, other]
-
Title: VaPar Synth -- A Variational Parametric Model for Audio SynthesisComments: this https URL , Accepted in ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [2] arXiv:2004.00175 [pdf, other]
-
Title: Improved Source Counting and Separation for Monaural MixtureSubjects: Audio and Speech Processing (eess.AS)
- [3] arXiv:2004.00200 [pdf, other]
-
Title: On The Differences Between Song and Speech Emotion Recognition: Effect of Feature Sets, Feature Types, and ClassifiersComments: 2 Figures, 2 TablesJournal-ref: 2020 IEEE REGION 10 CONFERENCE (TENCON), 968-972Subjects: Audio and Speech Processing (eess.AS)
- [4] arXiv:2004.00526 [pdf, other]
-
Title: Improved RawNet with Feature Map Scaling for Text-independent Speaker Verification using Raw WaveformsComments: 5 pages, 1 figure, 5 tables, submitted to Interspeech 2020 as a conference paperSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [5] arXiv:2004.00910 [pdf, other]
-
Title: Improving auditory attention decoding performance of linear and non-linear methods using state-space modelSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
- [6] arXiv:2004.00932 [pdf, other]
-
Title: iMetricGAN: Intelligibility Enhancement for Speech-in-Noise using Generative Adversarial Network-based Metric LearningComments: 5 pages, Submitted to INTERSPEECH 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [7] arXiv:2004.00960 [pdf, other]
-
Title: The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugmentComments: accepted at ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [8] arXiv:2004.00967 [pdf, other]
-
Title: Full-Sum Decoding for Hybrid HMM based Speech Recognition using LSTM Language ModelComments: accepted at ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [9] arXiv:2004.01221 [pdf, other]
-
Title: Towards Relevance and Sequence Modeling in Language RecognitionComments: this https URL Accepted to IEEE Transactions on Audio, Speech and Language ProcessingSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [10] arXiv:2004.01275 [pdf, other]
-
Title: AI4COVID-19: AI Enabled Preliminary Diagnosis for COVID-19 from Cough Samples via an AppAuthors: Ali Imran, Iryna Posokhova, Haneya N. Qureshi, Usama Masood, Muhammad Sajid Riaz, Kamran Ali, Charles N. John, MD Iftikhar Hussain, Muhammad NabeelComments: Accepted in Informatics in Medicine Unlocked 2020Journal-ref: Informatics in Medicine Unlocked, vol. 20, p. 100378, 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
- [11] arXiv:2004.01495 [pdf, other]
-
Title: Can Machine Learning Be Used to Recognize and Diagnose Coughs?Authors: Charles Bales, Muhammad Nabeel, Charles N. John, Usama Masood, Haneya N. Qureshi, Hasan Farooq, Iryna Posokhova, Ali ImranComments: Accepted in IEEE International Conference on E-Health and Bioengineering - EHB 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [12] arXiv:2004.01525 [pdf, ps, other]
-
Title: Towards democratizing music production with AI-Design of Variational Autoencoder-based Rhythm Generator as a DAW pluginAuthors: Nao TokuiComments: 4 pagesSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [13] arXiv:2004.01546 [pdf, other]
-
Title: Temporarily-Aware Context Modelling using Generative Adversarial Networks for Speech Activity DetectionAuthors: Tharindu Fernando, Sridha Sridharan, Mitchell McLaren, Darshana Priyasad, Simon Denman, Clinton FookesJournal-ref: IEEE/ACM Transactions on Audio, Speech and Language Processing, 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [14] arXiv:2004.01559 [pdf, other]
-
Title: Neural i-vectorsComments: Accepted to Odyssey 2020: The Speaker and Language Recognition Workshop. Version 2 (bugfix)Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
- [15] arXiv:2004.01922 [pdf, other]
-
Title: Subband modeling for spoofing detection in automatic speaker verificationComments: Accepted to the Speaker Odyssey (The Speaker and Language Recognition Workshop) 2020 conference. 8 pagesSubjects: Audio and Speech Processing (eess.AS)
- [16] arXiv:2004.02191 [pdf, other]
-
Title: Using Cyclic Noise as the Source Signal for Neural Source-Filter-based Speech Waveform ModelComments: Submitted to Interspeech 2020Subjects: Audio and Speech Processing (eess.AS)
- [17] arXiv:2004.02355 [pdf, other]
-
Title: Deep Multilayer Perceptrons for Dimensional Speech Emotion RecognitionComments: 2 figures, 4 tables, submitted to EUSIPCO 2020Journal-ref: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2020Subjects: Audio and Speech Processing (eess.AS)
- [18] arXiv:2004.02420 [pdf, other]
-
Title: Simultaneous Denoising and Dereverberation Using Deep Embedding FeaturesSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [19] arXiv:2004.02450 [pdf, other]
-
Title: A bio-inspired geometric model for sound reconstructionAuthors: Ugo Boscain (LJLL (UMR\_7598), CNRS, CaGE ), Dario Prandi (CNRS, L2S), Ludovic Sacchelli (LAGEPP), Giuseppina Turco (CNRS, LLF UMR7110)Subjects: Audio and Speech Processing (eess.AS); Analysis of PDEs (math.AP); Optimization and Control (math.OC); Neurons and Cognition (q-bio.NC)
- [20] arXiv:2004.02541 [pdf, other]
-
Title: Vocoder-Based Speech Synthesis from Silent VideosAuthors: Daniel Michelsanti, Olga Slizovskaia, Gloria Haro, Emilia Gómez, Zheng-Hua Tan, Jesper JensenComments: Accepted to Interspeech 2020Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
- [21] arXiv:2004.02863 [pdf, other]
-
Title: Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length PairsComments: Accepted to Interspeech 2020. The codes are available at this https URLSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [22] arXiv:2004.03194 [pdf, other]
-
Title: Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration UtterancesComments: Accepted to Interspeech 2020Journal-ref: Proc. Interspeech 2020, pp. 1501-1505Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [23] arXiv:2004.03428 [pdf, other]
-
Title: Universal Adversarial Perturbations Generative Network for Speaker RecognitionComments: Accepted by ICME2020Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
- [24] arXiv:2004.03434 [pdf, other]
-
Title: Learning to fool the speaker recognitionComments: Accepted by ICASSP2020Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
- [25] arXiv:2004.03437 [pdf, other]
-
Title: Homophone-based Label Smoothing in End-to-End Automatic Speech RecognitionSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [26] arXiv:2004.03512 [pdf, other]
-
Title: SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech EnhancementJournal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 29, 2021. (c) 2021 IEEESubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [27] arXiv:2004.03586 [pdf, other]
-
Title: From Artificial Neural Networks to Deep Learning for Music Generation -- History, Concepts and TrendsAuthors: Jean-Pierre BriotComments: To appear in the Special Issue on Art, Sound and Design in the Neural Computing and Applications JournalSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Machine Learning (stat.ML)
- [28] arXiv:2004.03781 [pdf, other]
-
Title: Emotional Voice Conversion With Cycle-consistent Adversarial NetworkComments: 5 pagesSubjects: Audio and Speech Processing (eess.AS)
- [29] arXiv:2004.03782 [pdf, other]
-
Title: Multi-Target Emotional Voice Conversion With Neural VocodersComments: 7 pagesSubjects: Audio and Speech Processing (eess.AS)
- [30] arXiv:2004.04001 [pdf, other]
-
Title: Noise Tokens: Learning Neural Noise Templates for Environment-Aware Speech EnhancementComments: 5 pages, Submitted to Interspeech 2020Subjects: Audio and Speech Processing (eess.AS)
- [31] arXiv:2004.04014 [pdf, other]
-
Title: Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker VerificationComments: Accepted by Speaker Odyssey 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [32] arXiv:2004.04040 [pdf, other]
-
Title: Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic MusicComments: Accepted by CSMT (The 9th Conference on Sound and Music Technology)Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [33] arXiv:2004.04054 [pdf, other]
-
Title: Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognitionComments: 4th Code-Switch workshop, FranceSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [34] arXiv:2004.04072 [pdf, ps, other]
-
Title: CNN-MoE based framework for classification of respiratory anomalies and lung disease detectionSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [35] arXiv:2004.04095 [pdf, other]
-
Title: Deep Normalization for Speaker VectorsSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [36] arXiv:2004.04096 [pdf, ps, other]
-
Title: Probabilistic embeddings for speaker diarizationComments: Awarded: Jack Godfrey Best Student Paper Award, at Odyssey 2020: The Speaker and Language Recognition Workshop, TokioSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [37] arXiv:2004.04098 [pdf, other]
-
Title: WaveCRN: An Efficient Convolutional Recurrent Neural Network for End-to-end Speech EnhancementSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [38] arXiv:2004.04099 [pdf, ps, other]
-
Title: Keywords Extraction and Sentiment Analysis using Automatic Speech RecognitionAuthors: Rachit ShuklaComments: 23 pages, 20 figures. Based on the work done as a part of the Science Academies' Summer Research Fellowship Programme (SRFP '19) at Vij\~na LabsSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
- [39] arXiv:2004.04290 [pdf, other]
-
Title: An investigation of phone-based subword units for end-to-end speech recognitionComments: Interspeech 2020 final version. Implementation for reproducing the results can be found at: this https URLSubjects: Audio and Speech Processing (eess.AS)
- [40] arXiv:2004.04371 [pdf, other]
-
Title: MDCNN-SID: Multi-scale Dilated Convolution Network for Singer IdentificationComments: Accepted by IJCNN2022 (The 2022 International Joint Conference on Neural Networks)Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM)
- [41] arXiv:2004.04410 [pdf, other]
-
Title: Att-HACK: An Expressive Speech Database with Social AttitudesComments: 5 pages, 5 figuresSubjects: Audio and Speech Processing (eess.AS)
- [42] arXiv:2004.04459 [pdf, ps, other]
-
Title: Fast frequency discrimination and phoneme recognition using a biomimetic membrane coupled to a neural networkComments: 7 pages, 4 figuresSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Biological Physics (physics.bio-ph)
- [43] arXiv:2004.04731 [pdf, other]
-
Title: Advancing Speech Synthesis using EEGComments: Under reviewSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [44] arXiv:2004.05274 [pdf, other]
-
Title: Improved Speech Representations with Multi-Target Autoregressive Predictive CodingComments: Accepted to ACL 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [45] arXiv:2004.05830 [pdf, other]
-
Title: From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from SpeechComments: 18 pages, 12 figures, Published as a conference paper at International Conference on Learning Representations (ICLR) 2020. (camera-ready version)Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
- [46] arXiv:2004.05989 [pdf, other]
-
Title: Data augmentation using generative networks to identify dementiaAuthors: Bahman Mirheidari, Yilin Pan, Daniel Blackburn, Ronan O'Malley, Traci Walker, Annalena Venneri, Markus Reuber, Heidi ChristensenSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [47] arXiv:2004.06332 [src]
-
Title: Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environmentComments: This paper has been rejectted by INTERSPEECH 2020. It has been modified extensively and submitted to APSIPA ASC 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [48] arXiv:2004.06338 [pdf, ps, other]
-
Title: Transformer based Grapheme-to-Phoneme ConversionComments: INTERSPEECH 2019Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [49] arXiv:2004.06422 [pdf, other]
-
Title: An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verificationComments: Accepted to Speaker Odyssey (The Speaker and Language Recognition Workshop), 2020, 8 pagesSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [50] arXiv:2004.06480 [pdf, other]
-
Title: Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speechComments: SLTU 2020. arXiv admin note: text overlap with arXiv:2003.03135Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [51] arXiv:2004.06579 [pdf, other]
-
Title: The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device researchComments: 14 pages, 13 figuresSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [52] arXiv:2004.06756 [pdf, other]
-
Title: Speaker Diarization with Lexical InformationAuthors: Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bowen Zhou, Panayiotis Georgiou, Shrikanth NarayananJournal-ref: Interspeech 2019, 391-395Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [53] arXiv:2004.06833 [pdf, ps, other]
-
Title: Alzheimer's Dementia Recognition through Spontaneous Speech: The ADReSS ChallengeComments: To appear in the Proceedings of INTERSPEECH 2020, Oct 2020, Shanghai, ChinaSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Machine Learning (stat.ML)
- [54] arXiv:2004.07370 [pdf, other]
-
Title: F0-consistent many-to-many non-parallel voice conversion via conditional autoencoderSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [55] arXiv:2004.07832 [pdf, other]
-
Title: Knowledge-and-Data-Driven Amplitude Spectrum Prediction for Hierarchical Neural VocodersComments: Submitted to Interspeech 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [56] arXiv:2004.07948 [pdf, other]
-
Title: Sound of Guns: Digital Forensics of Gun Audio Samples meets Artificial IntelligenceSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [57] arXiv:2004.07992 [pdf, other]
-
Title: Speech Paralinguistic Approach for Detecting Dementia Using Gated Convolutional Neural NetworkAuthors: Mariana Rodrigues Makiuchi, Tifani Warnita, Nakamasa Inoue, Koichi Shinoda, Michitaka Yoshimura, Momoko Kitazawa, Kei Funaki, Yoko Eguchi, Taishiro KishimotoSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Quantitative Methods (q-bio.QM)
- [58] arXiv:2004.08248 [pdf, ps, other]
-
Title: Acoustical classification of different speech acts using nonlinear methodsAuthors: Chirayata Bhattacharyya, Sourya Sengupta, Sayan Nag, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak GhoshComments: 6 pages, 2 figures; Proceedings of WESPAC 2018, New Delhi, India, November 11-15, 2018Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Chaotic Dynamics (nlin.CD); Neurons and Cognition (q-bio.NC)
- [59] arXiv:2004.08250 [pdf, other]
-
Title: How to Teach DNNs to Pay Attention to the Visual Modality in Speech RecognitionComments: in IEEE/ACM Transactions on Audio, Speech, and Language Processing (to appear)Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
- [60] arXiv:2004.08287 [pdf, other]
-
Title: Deep Neural Network for Respiratory Sound Classification in Wearable Devices Enabled by Patient Specific Model TuningSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [61] arXiv:2004.08326 [pdf, other]
-
Title: SpEx: Multi-Scale Time Domain Speaker Extraction NetworkComments: ACCEPTED in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [62] arXiv:2004.08531 [pdf, other]
-
Title: MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands RecognitionSubjects: Audio and Speech Processing (eess.AS)
- [63] arXiv:2004.08849 [pdf, other]
-
Title: The Attacker's Perspective on Automatic Speaker Verification: An OverviewComments: 5 pages, 1 figure, Submitted to Interspeech 2020Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR)
- [64] arXiv:2004.09347 [pdf, other]
-
Title: End-to-End Whisper to Natural Speech Conversion using Modified Transformer NetworkSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [65] arXiv:2004.09571 [pdf, other]
-
Title: Language-agnostic Multilingual ModelingSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Machine Learning (stat.ML)
- [66] arXiv:2004.09584 [pdf, other]
-
Title: ViSQOL v3: An Open Source Production Ready Objective Speech and Audio MetricAuthors: Michael Chinen, Felicia S. C. Lim, Jan Skoglund, Nikita Gureev, Feargus O'Gorman, Andrew HinesComments: 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX)Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
- [67] arXiv:2004.09607 [pdf, other]
-
Title: Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech SystemComments: 8 pages, 2 figures, submit to Oriental CocosdaSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [68] arXiv:2004.10120 [pdf, other]
-
Title: Vector Quantized Contrastive Predictive Coding for Template-based Music GenerationComments: 15 pages, 13 figuresSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [69] arXiv:2004.10246 [pdf, ps, other]
-
Title: Music Generation with Temporal Structure AugmentationAuthors: Shakeel RajaSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [70] arXiv:2004.10391 [pdf, other]
-
Title: Towards Linking the Lakh and IMSLP DatasetsAuthors: TJ TsaiComments: 5 pages, 4 figures, 1 table. Accepted paper at the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2020Subjects: Audio and Speech Processing (eess.AS); Multimedia (cs.MM); Sound (cs.SD); Image and Video Processing (eess.IV)
- [71] arXiv:2004.10799 [pdf, other]
-
Title: Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner Party TranscriptionComments: Accepted by Interspeech 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [72] arXiv:2004.10823 [pdf, other]
-
Title: Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent UnitComments: 5 pages. Accepted by ICASSP2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [73] arXiv:2004.11012 [pdf, other]
-
Title: ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN VocodersAuthors: Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun MaComments: Accepted by ISCSLP2021Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [74] arXiv:2004.11162 [pdf, other]
-
Title: Flexible framework for audio reconstructionJournal-ref: 23rd International Conference on Digital Audio Effects (eDAFx2020)Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [75] arXiv:2004.11284 [pdf, other]
-
Title: Unsupervised Speech Decomposition via Triple Information BottleneckSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [76] arXiv:2004.11544 [pdf, other]
-
Title: Towards Fast and Accurate Streaming End-to-End ASRAuthors: Bo Li, Shuo-yiin Chang, Tara N. Sainath, Ruoming Pang, Yanzhang He, Trevor Strohman, Yonghui WuComments: Accepted in ICASSP 2020Subjects: Audio and Speech Processing (eess.AS)
- [77] arXiv:2004.11956 [pdf, other]
-
Title: Binaural Audio Source Remixing with Microphone Array Listening DevicesComments: To appear at ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [78] arXiv:2004.12046 [pdf, ps, other]
-
Title: Sound Event Detection Utilizing Graph Laplacian Regularization with Event Co-occurrenceComments: Accepted to IEICE Transactions on Information and SystemsSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [79] arXiv:2004.12071 [pdf, other]
-
Title: Active Voice AuthenticationComments: 39 pages, 4 figuresJournal-ref: Digital Signal Processing, Volume 101, June 2020, 102672, ISSN 1051-2004Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
- [80] arXiv:2004.12261 [pdf, other]
-
Title: Enabling Fast and Universal Audio Adversarial Attack Using Generative ModelComments: Publish on AAAI21Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [81] arXiv:2004.12745 [pdf, other]
-
Title: Time-Frequency Analysis and Parameterisation of Knee Sounds for Non-invasive Detection of OsteoarthritisComments: Submitted to IEEE Transactions on Biomedical EngineeringSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [82] arXiv:2004.13172 [pdf, other]
-
Title: Autoencoding Neural Networks as Musical Audio SynthesizersJournal-ref: Proceedings of the 21st International Conference on Digital Audio Effects (DAFx-18), 2018, pp40-44Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [83] arXiv:2004.13480 [pdf, other]
-
Title: L-Vector: Neural Label Embedding for Domain AdaptationComments: 5 pages, 2 figure, ICASSP 2020Journal-ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, SpainSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [84] arXiv:2004.13521 [pdf, ps, other]
-
Title: Detect Language of Transliterated TextsAuthors: Sourav SenComments: 10 pages, 8 figures, 3 tablesSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [85] arXiv:2004.13522 [pdf, ps, other]
-
Title: Research on Modeling Units of Transformer Transducer for Mandarin Speech RecognitionComments: 5 pages, 3 figuresSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [86] arXiv:2004.13670 [pdf, other]
-
Title: Neural Speech Separation Using Spatially Distributed MicrophonesComments: 5 pages, 2 figures, Interspeech2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [87] arXiv:2004.13764 [pdf, other]
-
Title: Conditional Spoken Digit Generation with StyleGANComments: Interspeech2020 accepted versionSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [88] arXiv:2004.14091 [pdf, other]
-
Title: Determined BSS based on time-frequency masking and its application to harmonic vector analysisSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
- [89] arXiv:2004.14617 [pdf, other]
-
Title: CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-SpeechAuthors: Sri Karlapati, Alexis Moinet, Arnaud Joly, Viacheslav Klimkov, Daniel Sáez-Trigueros, Thomas DrugmanJournal-ref: INTERSPEECH 2020: 4387-4391Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [90] arXiv:2004.14762 [pdf, other]
-
Title: Time-domain speaker extraction networkComments: Published in ASRU 2019. arXiv admin note: text overlap with arXiv:2004.08326Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [91] arXiv:2004.14832 [pdf, other]
-
Title: A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applicationsSubjects: Audio and Speech Processing (eess.AS); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Sound (cs.SD)
- [92] arXiv:2004.14840 [pdf, other]
-
Title: Multiresolution and Multimodal Speech Recognition with TransformersComments: Accepted for ACL 2020Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [93] arXiv:2004.14859 [pdf, other]
-
Title: Robust Phonetic Segmentation Using Spectral Transition measure for Non-Standard Recording EnvironmentsComments: 6 pages, 6 figuresSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [94] arXiv:2004.03712 (cross-list from eess.SP) [pdf, other]
-
Title: Heart Sound Segmentation using Bidirectional LSTMs with AttentionAuthors: Tharindu Fernando, Houman Ghaemmaghami, Simon Denman, Sridha Sridharan, Nayyar Hussain, Clinton FookesComments: IEEE Journal of Biomedical and Health Informatics, 25 October 2019Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
- [95] arXiv:2004.03926 (cross-list from eess.SP) [pdf, other]
-
Title: MM Algorithms for Joint Independent Subspace Analysis with Application to Blind Single and Multi-Source ExtractionComments: 15 pages, 4 figuresSubjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [96] arXiv:2004.00132 (cross-list from cs.SD) [pdf, other]
-
Title: AM-MobileNet1D: A Portable Model for Speaker RecognitionComments: 2020 International Joint Conference on Neural Networks (IJCNN)Journal-ref: 2020 International Joint Conference on Neural Networks (IJCNN)Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [97] arXiv:2004.01023 (cross-list from cs.MM) [pdf, other]
-
Title: Multi-Modal Video Forensic Platform for Investigating Post-Terrorist Attack ScenariosJournal-ref: In Proceedings of the 11th ACM Multimedia Systems Conference (MMSys2020), June 06-11, 2020, Istanbul, TurkeySubjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Computers and Society (cs.CY); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [98] arXiv:2004.02219 (cross-list from cs.CL) [pdf, other]
-
Title: Speaker Recognition using SincNet and X-Vector FusionComments: The 19th International Conference on Artificial Intelligence and Soft ComputingSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [99] arXiv:2004.03413 (cross-list from cs.MM) [pdf, other]
-
Title: Direct Speech-to-image TranslationComments: Accepted by JSTSPSubjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [100] arXiv:2004.03873 (cross-list from cs.SD) [pdf, other]
-
Title: Conditioned Source Separation for Music Instrument PerformancesComments: 14 pages, 5 figures, under reviewSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [101] arXiv:2004.04662 (cross-list from cs.LG) [pdf, other]
-
Title: Residual Shuffle-Exchange Networks for Fast Processing of Long SequencesComments: 35th AAAI Conference on Artificial Intelligence (AAAI-21)Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [102] arXiv:2004.04972 (cross-list from cs.CL) [pdf, other]
-
Title: Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker DataComments: Accepted to IEEE ICASSP 2020Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [103] arXiv:2004.05985 (cross-list from cs.CL) [pdf, ps, other]
-
Title: Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?Authors: Łukasz Augustyniak, Piotr Szymanski, Mikołaj Morzy, Piotr Zelasko, Adrian Szymczak, Jan Mizgajski, Yishay Carmiel, Najim DehakComments: submitted to INTERSPEECH'20Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [104] arXiv:2004.07070 (cross-list from cs.CL) [pdf, other]
-
Title: Analyzing analytical methods: The case of phonology in neural models of spoken languageComments: ACL 2020Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [105] arXiv:2004.07171 (cross-list from cs.SD) [pdf, other]
-
Title: Musical Features for Automatic Music Transcription EvaluationComments: Technical reportSubjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
- [106] arXiv:2004.07301 (cross-list from cs.CV) [pdf, other]
-
Title: ESResNet: Environmental Sound Classification Based on Visual Domain ModelsComments: 8 pages, 4 figures; submitted to ICPR 2020Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [107] arXiv:2004.07442 (cross-list from cs.CR) [pdf, other]
-
Title: Voice-Indistinguishability: Protecting Voiceprint in Privacy-Preserving Speech Data ReleaseComments: The paper has been accepted by the IEEE International Conference on Multimedia & Expo 2020(ICME 2020)Subjects: Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [108] arXiv:2004.07800 (cross-list from cs.HC) [pdf, other]
-
Title: Leveraging GANs to Improve Continuous Path Keyboard Input ModelsSubjects: Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
- [109] arXiv:2004.07820 (cross-list from cs.SD) [pdf, ps, other]
-
Title: Speaker Recognition in Bengali Language from Nonlinear FeaturesAuthors: Uddalok Sarkar, Soumyadeep Pal, Sayan Nag, Chirayata Bhattacharya, Shankha Sanyal, Archi Banerjee, Ranjan Sengupta, Dipak GhoshSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [110] arXiv:2004.08269 (cross-list from cs.SD) [pdf, other]
-
Title: Beat Detection and Automatic Annotation of the Music of Bharatanatyam Dance using Speech Recognition TechniquesSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [111] arXiv:2004.09249 (cross-list from cs.SD) [pdf, other]
-
Title: CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented RecordingsAuthors: Shinji Watanabe, Michael Mandel, Jon Barker, Emmanuel Vincent, Ashish Arora, Xuankai Chang, Sanjeev Khudanpur, Vimal Manohar, Daniel Povey, Desh Raj, David Snyder, Aswin Shanmugam Subramanian, Jan Trmal, Bar Ben Yair, Christoph Boeddeker, Zhaoheng Ni, Yusuke Fujita, Shota Horiguchi, Naoyuki Kanda, Takuya Yoshioka, Neville RyantSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [112] arXiv:2004.09476 (cross-list from cs.CV) [pdf, other]
-
Title: Music Gesture for Visual Sound SeparationComments: CVPR 2020. Project page: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [113] arXiv:2004.10087 (cross-list from cs.CL) [pdf, other]
-
Title: AGIF: An Adaptive Graph-Interactive Framework for Joint Multiple Intent Detection and Slot FillingComments: Accepted at Findings of EMNLP 2020. Data and code are available at this [URL] (this https URL)Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [114] arXiv:2004.10093 (cross-list from cs.CL) [pdf, other]
-
Title: Curriculum Pre-training for End-to-End Speech TranslationComments: accepted by ACL2020Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [115] arXiv:2004.10234 (cross-list from cs.CL) [pdf, ps, other]
-
Title: ESPnet-ST: All-in-One Speech Translation ToolkitAuthors: Hirofumi Inaguma, Shun Kiyono, Kevin Duh, Shigeki Karita, Nelson Enrique Yalta Soplin, Tomoki Hayashi, Shinji WatanabeComments: Accepted at ACL 2020 System Demonstration (update Table1, fix typo)Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [116] arXiv:2004.10345 (cross-list from cs.MM) [pdf, other]
-
Title: MIDI-Sheet Music Alignment Using Bootleg Score SynthesisComments: 8 pages, 6 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
- [117] arXiv:2004.10347 (cross-list from cs.MM) [pdf, other]
-
Title: MIDI Passage Retrieval Using Cell Phone Pictures of Sheet MusicComments: 8 pages, 8 figures, 1 table. Accepted paper at the International Society for Music Information Retrieval Conference (ISMIR) 2019Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
- [118] arXiv:2004.10454 (cross-list from cs.CL) [pdf, other]
-
Title: A Study of Non-autoregressive Model for Sequence GenerationComments: Accepted by ACL 2020Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [119] arXiv:2004.11419 (cross-list from cs.SD) [pdf, other]
-
Title: End-to-end speech-to-dialog-act recognitionSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [120] arXiv:2004.11724 (cross-list from cs.MM) [pdf, other]
-
Title: Using Cell Phone Pictures of Sheet Music To Retrieve MIDI PassagesComments: 13 pages, 8 figures, 3 tables. Accepted article in IEEE Transactions on Multimedia. arXiv admin note: text overlap with arXiv:2004.10347Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
- [121] arXiv:2004.12031 (cross-list from cs.LG) [pdf, ps, other]
-
Title: On the Role of Visual Cues in Audiovisual Speech EnhancementAuthors: Zakaria Aldeneh, Anushree Prasanna Kumar, Barry-John Theobald, Erik Marchi, Sachin Kajarekar, Devang Naik, Ahmed Hussen AbdelazizComments: ICASSP 2021Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [122] arXiv:2004.12111 (cross-list from cs.SD) [pdf, ps, other]
-
Title: Jointly Trained Transformers models for Spoken Language TranslationComments: 7-pages,3 figuresSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [123] arXiv:2004.12200 (cross-list from cs.SD) [pdf, other]
-
Title: Depthwise Separable Convolutional ResNet with Squeeze-and-Excitation Blocks for Small-footprint Keyword SpottingSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [124] arXiv:2004.12569 (cross-list from cs.MM) [pdf, ps, other]
-
Title: DWT-GBT-SVD-based Robust Speech SteganographyComments: 10 pages, 4 FiguresSubjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [125] arXiv:2004.13007 (cross-list from cs.IR) [pdf, ps, other]
-
Title: A session-based song recommendation approach involving user characterization along the play power-law distributionAuthors: Diego Sánchez-Moreno, Vivian F. López Batista, M. Dolores Muñoz Vicente, Ana B. Gil González, María N. Moreno-GarcíaComments: Accepted in Complexity (ISSN: 1099-0526)Subjects: Information Retrieval (cs.IR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [126] arXiv:2004.13595 (cross-list from cs.SD) [pdf, other]
-
Title: Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual NoiseComments: submitted to IEEE SPLSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [127] arXiv:2004.13780 (cross-list from cs.CV) [pdf, other]
-
Title: Cross-modal Speaker Verification and Recognition: A Multilingual PerspectiveAuthors: Muhammad Saad Saeed, Shah Nawaz, Pietro Morerio, Arif Mahmood, Ignazio Gallo, Muhammad Haroon Yousaf, Alessio Del BueComments: Accepted: CVPRWSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [128] arXiv:2004.14228 (cross-list from cs.CL) [pdf, other]
-
Title: Meta-Transfer Learning for Code-Switched Speech RecognitionComments: Accepted in ACL 2020. The first two authors contributed equally to this workSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [129] arXiv:2004.14326 (cross-list from cs.SD) [pdf, other]
-
Title: Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervisionComments: Under submission as a conference paperSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
- [130] arXiv:2004.14368 (cross-list from cs.CV) [pdf, other]
-
Title: VGGSound: A Large-scale Audio-Visual DatasetComments: ICASSP2020Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [131] arXiv:2004.14846 (cross-list from cs.CL) [pdf, other]
-
Title: The role of context in neural pitch accent detection in EnglishJournal-ref: Proceedings of the 2020 Conference on Empirical Methods in Natural Language ProcessingSubjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [132] arXiv:2004.14858 (cross-list from cs.MM) [pdf, other]
-
Title: MuSe 2020 -- The First International Multimodal Sentiment Analysis in Real-life Media Challenge and WorkshopAuthors: Lukas Stappen, Alice Baird, Georgios Rizos, Panagiotis Tzirakis, Xinchen Du, Felix Hafner, Lea Schumann, Adria Mallol-Ragolta, Björn W. Schuller, Iulia Lefter, Erik Cambria, Ioannis KompatsiarisComments: Baseline Paper MuSe 2020, MuSe Workshop Challenge, ACM MultimediaSubjects: Multimedia (cs.MM); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ showing 132 entries per page: fewer | more ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, eess, 2404, contact, help (Access key information)