We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for eess.AS in Nov 2020

[ total of 227 entries: 1-227 ]
[ showing 227 entries per page: fewer | more ]
[1]  arXiv:2011.00030 [pdf, other]
Title: A Curated Dataset of Urban Scenes for Audio-Visual Scene Analysis
Comments: accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS)
[2]  arXiv:2011.00091 [pdf, other]
Title: Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization
Comments: submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[3]  arXiv:2011.00175 [pdf, other]
Title: Multimodal Urban Sound Tagging with Spatiotemporal Context
Subjects: Audio and Speech Processing (eess.AS)
[4]  arXiv:2011.00316 [pdf, other]
Title: AGAIN-VC: A One-shot Voice Conversion using Activation Guidance and Adaptive Instance Normalization
Comments: Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[5]  arXiv:2011.00502 [pdf, other]
Title: Focusing Phenomena in Linear Discrete Inverse Problems in Acoustics
Comments: 33 pages, 23 figures, submitted for review to the Journal of Sound and Vibration; fixed typos and minor revision in sections 6.1.4-6.1.5 and 6.2
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[6]  arXiv:2011.00699 [pdf, other]
Title: Transformer-based Arabic Dialect Identification
Comments: Accepted for publication in International Conference on Asian Language Processing (IALP) 2020
Subjects: Audio and Speech Processing (eess.AS)
[7]  arXiv:2011.00721 [pdf, other]
Title: Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations
Comments: arXiv admin note: text overlap with arXiv:2001.07067
Journal-ref: Proc. Interspeech 2020, 1649-1653 (2020)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[8]  arXiv:2011.00935 [pdf, other]
Title: FeatherTTS: Robust and Efficient attention based Neural TTS
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[9]  arXiv:2011.01108 [pdf, ps, other]
Title: End-to-end anti-spoofing with RawNet2
Comments: Accepted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS)
[10]  arXiv:2011.01130 [pdf, other]
Title: Speaker anonymisation using the McAdams coefficient
Comments: Accepted at INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[11]  arXiv:2011.01174 [pdf, other]
Title: Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech
Comments: 9 pages, 5 figures, 4 tables
Journal-ref: IEEE Access, vol. 10, pp. 52621 - 52629, 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[12]  arXiv:2011.01175 [pdf, other]
Title: CAMP: a Two-Stage Approach to Modelling Prosody in Context
Comments: 5 pages. Published in the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021)
Subjects: Audio and Speech Processing (eess.AS)
[13]  arXiv:2011.01210 [pdf, other]
Title: Focus on the present: a regularization method for the ASR source-target attention layer
Comments: submitted to ICASSP2021. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[14]  arXiv:2011.01557 [pdf, other]
Title: StyleMelGAN: An Efficient High-Fidelity Adversarial Vocoder with Temporal Adaptive Normalization
Comments: Accepted to ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[15]  arXiv:2011.01570 [pdf, other]
Title: Dynamic latency speech recognition with asynchronous revision
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[16]  arXiv:2011.01576 [pdf, other]
Title: Improving RNN transducer with normalized jointer network
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[17]  arXiv:2011.01678 [pdf, other]
Title: Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion
Comments: Accepted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[18]  arXiv:2011.01686 [pdf, ps, other]
Title: Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization
Comments: To appear in ISCSLP2021
Subjects: Audio and Speech Processing (eess.AS)
[19]  arXiv:2011.01691 [pdf, other]
Title: A Study of Incorporating Articulatory Movement Information in Speech Enhancement
Subjects: Audio and Speech Processing (eess.AS)
[20]  arXiv:2011.01965 [pdf, ps, other]
Title: Short-time deep-learning based source separation for speech enhancement in reverberant environments with beamforming
Subjects: Audio and Speech Processing (eess.AS)
[21]  arXiv:2011.01986 [pdf, other]
Title: Unsupervised Pattern Discovery from Thematic Speech Archives Based on Multilingual Bottleneck Features
Comments: 8 pages, accepted and presented in APSIPA-APC 2018. This work was done when Man-Ling Sung and Siyuan Feng were postgraduate students in the Chinese University of Hong Kong
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[22]  arXiv:2011.01991 [pdf, other]
Title: Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition
Comments: 8 pages, 2 figures, SLT 2021
Journal-ref: 2021 IEEE Spoken Language Technology Workshop (SLT)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[23]  arXiv:2011.01997 [pdf, other]
Title: DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs
Comments: Accepted to IEEE SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[24]  arXiv:2011.02008 [pdf, other]
Title: Complex ratio masking for singing voice separation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[25]  arXiv:2011.02014 [pdf, other]
Title: Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis
Comments: Accepted to IEEE SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[26]  arXiv:2011.02090 [pdf, other]
Title: Frustratingly Easy Noise-aware Training of Acoustic Models
Comments: 6 + 3 (Appendix) pages
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[27]  arXiv:2011.02102 [pdf, other]
Title: Robust Speaker Extraction Network Based on Iterative Refined Adaptation
Comments: Accepted by Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS)
[28]  arXiv:2011.02109 [pdf, ps, other]
Title: Deep Multi-task Network for Delay Estimation and Echo Cancellation
Comments: Accepted by Interspeech 2020
Subjects: Audio and Speech Processing (eess.AS)
[29]  arXiv:2011.02132 [pdf, other]
Title: Multi-Modal Transformers Utterance-Level Code-Switching Detection
Authors: Krishna D N
Comments: 8 pages, 2 figures
Subjects: Audio and Speech Processing (eess.AS)
[30]  arXiv:2011.02136 [pdf, other]
Title: Interpretable Representation Learning for Speech and Audio Signals Based on Relevance Weighting
Comments: arXiv admin note: text overlap with arXiv:2011.00721
Journal-ref: IEEE Transactions and Audio, Speech and Language Processing, Vol. 28, pp. 2823 - 2836, 2020
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[31]  arXiv:2011.02168 [pdf, other]
Title: Learning in your voice: Non-parallel voice conversion based on speaker consistency loss
Comments: ICASSP 2021 submitted
Subjects: Audio and Speech Processing (eess.AS)
[32]  arXiv:2011.02252 [pdf, other]
Title: Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
Comments: 5 pages and 3 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[33]  arXiv:2011.02421 [pdf, other]
Title: One-shot conditional audio filtering of arbitrary sounds
Subjects: Audio and Speech Processing (eess.AS)
[34]  arXiv:2011.02561 [pdf, other]
Title: A Multi-Channel Temporal Attention Convolutional Neural Network Model for Environmental Sound Classification
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[35]  arXiv:2011.02619 [pdf, ps, other]
Title: Don't look back: an online beat tracking method using RNN and enhanced particle filtering
Comments: IEEE International Conference on Acoustics, Speech and Signal Processing, (ICASSP 2021). (ACCEPTED)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[36]  arXiv:2011.02698 [pdf, other]
Title: A Comparison Study on Infant-Parent Voice Diarization
Comments: ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[37]  arXiv:2011.02774 [pdf, other]
Title: Multi-Accent Adaptation based on Gate Mechanism
Comments: Accepted in INTERSPEECH 2019
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[38]  arXiv:2011.02782 [pdf, other]
Title: Domain Adaptation Using Class Similarity for Robust Speech Recognition
Comments: Accepted in INTERSPEECH 2020
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[39]  arXiv:2011.02900 [pdf, other]
Title: Multi-class Spectral Clustering with Overlaps for Speaker Diarization
Comments: Accepted at IEEE SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[40]  arXiv:2011.02921 [pdf, ps, other]
Title: Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Comments: Submitted to ICASSP 2021. arXiv admin note: text overlap with arXiv:2006.10930, arXiv:2008.04546
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[41]  arXiv:2011.02949 [pdf, other]
Title: Anomalous Sound Detection as a Simple Binary Classification Problem with Careful Selection of Proxy Outlier Examples
Comments: published in DCASE 2020 Workshop
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[42]  arXiv:2011.03110 [pdf, other]
Title: Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription
Comments: Accepted to SLT2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[43]  arXiv:2011.03115 [pdf, ps, other]
Title: A Hierarchical Subspace Model for Language-Attuned Acoustic Unit Discovery
Comments: Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[44]  arXiv:2011.03118 [pdf, other]
Title: Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages
Comments: In Proceedings of The First Workshop on Speech Technologies for Code-Switching in Multilingual Communities
Journal-ref: http://festvox.org/cedar/WSTCSMC2020.pdf
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[45]  arXiv:2011.03426 [src]
Title: Self-Supervised Learning from Contrastive Mixtures for Personalized Speech Enhancement
Comments: This work has been superseded by article 2104.02017
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[46]  arXiv:2011.03432 [pdf, ps, other]
Title: Misalignment Recognition in Acoustic Sensor Networks using a Semi-supervised Source Estimation Method and Markov Random Fields
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[47]  arXiv:2011.03706 [pdf, other]
Title: ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration
Comments: Accepted by SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[48]  arXiv:2011.03810 [pdf, other]
Title: Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[49]  arXiv:2011.03943 [pdf, other]
Title: Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Authors: Daxin Tan, Tan Lee
Comments: Accepted by Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[50]  arXiv:2011.04084 [pdf, other]
Title: Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations
Comments: Accepted at SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Image and Video Processing (eess.IV)
[51]  arXiv:2011.04359 [pdf, ps, other]
Title: An Empirical Study of Visual Features for DNN based Audio-Visual Speech Enhancement in Multi-talker Environments
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[52]  arXiv:2011.04456 [pdf, other]
Title: Efficient Training Data Generation for Phase-Based DOA Estimation
Comments: Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[53]  arXiv:2011.04569 [pdf, other]
Title: Informed Source Extraction With Application to Acoustic Echo Reduction
Comments: Published at ITG 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[54]  arXiv:2011.04785 [pdf, ps, other]
Title: Benchmarking LF-MMI, CTC and RNN-T Criteria for Streaming ASR
Comments: Accepted for publication at IEEE Spoken Language Technology Workshop (SLT), 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[55]  arXiv:2011.04896 [pdf, ps, other]
Title: An Empirical Study on Text-Independent Speaker Verification based on the GE2E Method
Comments: 6 pages, 7 tables, 2 figures, 4 algorithms. An empirical study on the paper arXiv:1710.10467 by Wan et al. (2017)
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[56]  arXiv:2011.05038 [pdf, other]
Title: Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model
Comments: 8 pages. Accepted to IEEE SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[57]  arXiv:2011.05161 [pdf, other]
Title: Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[58]  arXiv:2011.05540 [pdf, other]
Title: Surrogate Source Model Learning for Determined Source Separation
Comments: 5 pages, 3 figures, 1 table. Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[59]  arXiv:2011.05649 [pdf, other]
Title: Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients
Comments: Accepted by IEEE SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[60]  arXiv:2011.05707 [pdf, other]
Title: Low-resource expressive text-to-speech using data augmentation
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[61]  arXiv:2011.05731 [pdf, other]
Title: FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation
Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME) 2021
Subjects: Audio and Speech Processing (eess.AS)
[62]  arXiv:2011.05958 [pdf, other]
Title: On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
Comments: Presented at IEEE ICASSP 2020
Journal-ref: Proc. ICASSP (2020) 6389-6393
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL)
[63]  arXiv:2011.06110 [pdf, other]
Title: Efficient Knowledge Distillation for RNN-Transducer Models
Comments: 5 pages, 1 figure, 2 tables; submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[64]  arXiv:2011.06239 [pdf, other]
Title: The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge
Comments: Submitted to 2021 SLT Children Speech Recognition Challenge (CSRC)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[65]  arXiv:2011.06465 [pdf, other]
Title: Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Comments: Accepted by SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[66]  arXiv:2011.06548 [pdf, other]
Title: Evaluating the Intelligibility Benefits of Neural Speech Enrichment for Listeners with Normal Hearing and Hearing Impairment using the Greek Harvard Corpus
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[67]  arXiv:2011.06739 [pdf, other]
Title: Generalized Dilated CNN Models for Depression Detection Using Inverted Vocal Tract Variables
Comments: 5 pages, Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[68]  arXiv:2011.07065 [pdf, other]
Title: Multi-Modal Emotion Detection with Transfer Learning
Comments: 11 pages, 7 tables, 2 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
[69]  arXiv:2011.07274 [pdf, other]
Title: On Filter Generalization for Music Bandwidth Extension Using Deep Neural Networks
Comments: Qualitative examples on this https URL Source code on this https URL
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[70]  arXiv:2011.07338 [pdf, other]
Title: Distortion-controlled Training for End-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss
Comments: SLT 2021
Subjects: Audio and Speech Processing (eess.AS)
[71]  arXiv:2011.07545 [pdf, ps, other]
Title: Automatic dysarthric speech detection exploiting pairwise distance-based convolutional neural networks
Comments: accepted at ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS)
[72]  arXiv:2011.07547 [pdf, other]
Title: Multi-task single channel speech enhancement using speech presence probability as a secondary task training target
Comments: EUSIPCO 2021
Subjects: Audio and Speech Processing (eess.AS)
[73]  arXiv:2011.07755 [pdf, other]
Title: Audio-visual Multi-channel Integration and Recognition of Overlapped Speech
Comments: TASLP 2021
Subjects: Audio and Speech Processing (eess.AS)
[74]  arXiv:2011.07791 [pdf, other]
Title: Block-Online Guided Source Separation
Comments: Accepted to SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[75]  arXiv:2011.07859 [pdf, other]
Title: A General Network Architecture for Sound Event Localization and Detection Using Transfer Learning and Recurrent Neural Network
Subjects: Audio and Speech Processing (eess.AS)
[76]  arXiv:2011.08346 [pdf, other]
Title: Refining Automatic Speech Recognition System for older adults
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[77]  arXiv:2011.08397 [pdf, other]
Title: Ultra-Lightweight Speech Separation via Group Communication
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[78]  arXiv:2011.08400 [pdf, other]
Title: Rethinking the Separation Layers in Speech Separation Networks
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[79]  arXiv:2011.08401 [pdf, other]
Title: Implicit Filter-and-sum Network for Multi-channel Speech Separation
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[80]  arXiv:2011.08480 [pdf, other]
Title: s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Comments: 5 pages, 5 figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[81]  arXiv:2011.09044 [pdf, other]
Title: Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding
Comments: 7 pages, 6 figures
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[82]  arXiv:2011.09162 [pdf, other]
Title: WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation
Comments: accepted by SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[83]  arXiv:2011.09270 [pdf, other]
Title: Respiratory Distress Detection from Telephone Speech using Acoustic and Prosodic Features
Comments: 5 pages, 4 figures
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[84]  arXiv:2011.09624 [pdf, other]
Title: Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals
Comments: Accepted in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[85]  arXiv:2011.09631 [pdf, other]
Title: Universal MelGAN: A Robust Neural Vocoder for High-Fidelity Waveform Generation in Multiple Domains
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[86]  arXiv:2011.09804 [pdf, other]
Title: TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos
Comments: 8 pages, 4 figures, Accepted to SLT2021, IEEE Spoken Language Technology Workshop
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV)
[87]  arXiv:2011.10345 [pdf, other]
Title: Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement
Comments: submitted to the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, Ontario, Canada
Subjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[88]  arXiv:2011.10527 [pdf, other]
Title: Multi-Scale Speaker Diarization With Neural Affinity Score Fusion
Comments: Submitted to ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS)
[89]  arXiv:2011.10538 [pdf, other]
Title: Improving RNN-T ASR Accuracy Using Context Audio
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[90]  arXiv:2011.10706 [pdf, other]
Title: Speech Denoising with Auditory Models
Comments: First two authors contributed equally, 5 pages, 3 PDF figures
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[91]  arXiv:2011.10798 [pdf, other]
Title: A Better and Faster End-to-End Model for Streaming ASR
Comments: Accepted in ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[92]  arXiv:2011.11315 [pdf, other]
Title: End-to-end Silent Speech Recognition with Acoustic Sensing
Comments: will be presented in SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[93]  arXiv:2011.11564 [pdf, other]
Title: Using Synthetic Audio to Improve The Recognition of Out-Of-Vocabulary Words in End-To-End ASR Systems
Comments: To appear in Proc. ICASSP2021, June 06-11, 2021, Toronto, Ontario, Canada
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[94]  arXiv:2011.11671 [pdf, other]
Title: Streaming Multi-speaker ASR with RNN-T
Comments: Accepted at ICASSP2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[95]  arXiv:2011.11818 [pdf, other]
Title: Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[96]  arXiv:2011.11984 [pdf, other]
Title: Integration of variational autoencoder and spatial clustering for adaptive multi-channel neural speech separation
Comments: 8 pages, 3 figures, to be published in SLT2021
Subjects: Audio and Speech Processing (eess.AS)
[97]  arXiv:2011.12063 [pdf, other]
Title: How Far Are We from Robust Voice Conversion: A Survey
Comments: Accepted by SLT 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[98]  arXiv:2011.12133 [pdf, other]
Title: Zero-Shot Audio Classification via Semantic Embeddings
Comments: Submitted to Transactions on Audio, Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS)
[99]  arXiv:2011.12206 [pdf, other]
Title: TFGAN: Time and Frequency Domain Based Generative Adversarial Network for High-fidelity Speech Synthesis
Subjects: Audio and Speech Processing (eess.AS)
[100]  arXiv:2011.12221 [pdf, ps, other]
Title: A light transformer for speech-to-intent applications
Comments: To be published in SLT 2021
Subjects: Audio and Speech Processing (eess.AS)
[101]  arXiv:2011.12564 [pdf, ps, other]
Title: Soft-Median Choice: An Automatic Feature Smoothing Method for Sound Event Detection
Comments: 5 pages, 3 figures, 6 tables
Subjects: Audio and Speech Processing (eess.AS)
[102]  arXiv:2011.12657 [pdf, other]
Title: Zero-Shot Audio Classification with Factored Linear and Nonlinear Acoustic-Semantic Projections
Comments: Accepted by ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS)
[103]  arXiv:2011.12696 [pdf, other]
Title: Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
[104]  arXiv:2011.12941 [pdf, other]
Title: Small Footprint Convolutional Recurrent Networks for Streaming Wakeword Detection
Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Audio and Speech Processing (eess.AS)
[105]  arXiv:2011.12998 [pdf, other]
Title: VoxLingua107: a Dataset for Spoken Language Recognition
Comments: Accepted at IEEE Spoken Language Technology Workshop (SLT) 2021
Subjects: Audio and Speech Processing (eess.AS)
[106]  arXiv:2011.13090 [pdf, other]
Title: Multi-QuartzNet: Multi-Resolution Convolution for Speech Recognition with Multi-Layer Feature Fusion
Comments: will be presented in SLT 2021
Subjects: Audio and Speech Processing (eess.AS)
[107]  arXiv:2011.13834 [pdf, other]
Title: Transformer-based Online Speech Recognition with Decoder-end Adaptive Computation Steps
Comments: 7 pages, 1 figure, accepted at SLT 2021
Subjects: Audio and Speech Processing (eess.AS)
[108]  arXiv:2011.14060 [pdf, other]
Title: Unsupervised Spoken Term Discovery on Untranscribed Speech
Authors: Man-Ling Sung
Comments: Thesis submitted in September 2019 for the M.Phil degree in Electronic Engineering at The Chinese University of Hong Kong (CUHK)
Subjects: Audio and Speech Processing (eess.AS)
[109]  arXiv:2011.14062 [pdf, other]
Title: Unsupervised Spoken Term Discovery Based on Re-clustering of Hypothesized Speech Segments with Siamese and Triplet Networks
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[110]  arXiv:2011.02195 (cross-list from eess.SP) [pdf, other]
Title: Correlation based Multi-phasal models for improved imagined speech EEG recognition
Journal-ref: Interspeech SMM 2020
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[111]  arXiv:2011.08848 (cross-list from eess.SP) [pdf, ps, other]
Title: Deep Networks for Direction-of-Arrival Estimation in Low SNR
Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[112]  arXiv:2011.00196 (cross-list from cs.SD) [pdf, other]
Title: RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting
Comments: Code visible at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[113]  arXiv:2011.00200 (cross-list from cs.SD) [pdf, other]
Title: The xx205 System for the VoxCeleb Speaker Recognition Challenge 2020
Authors: Xu Xiang
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[114]  arXiv:2011.00695 (cross-list from cs.SD) [pdf, other]
Title: Learning generic feature representation with synthetic data for weakly-supervised sound event detection by inter-frame distance loss
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115]  arXiv:2011.00747 (cross-list from cs.CL) [pdf, other]
Title: Dual-decoder Transformer for Joint Automatic Speech Recognition and Multilingual Speech Translation
Comments: Accepted at COLING 2020 (Oral)
Journal-ref: The 28th International Conference on Computational Linguistics (COLING 2020)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116]  arXiv:2011.00771 (cross-list from cs.LG) [pdf, ps, other]
Title: Multitask Learning and Joint Optimization for Transformer-RNN-Transducer Speech Recognition
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117]  arXiv:2011.00773 (cross-list from cs.SD) [pdf, other]
Title: Using a Bi-directional LSTM Model with Attention Mechanism trained on MIDI Data for Generating Unique Music
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[118]  arXiv:2011.00782 (cross-list from cs.SD) [pdf, other]
Title: CVC: Contrastive Learning for Non-parallel Voice Conversion
Comments: Submitted Interspeech 2021, Project Page: this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119]  arXiv:2011.00801 (cross-list from cs.SD) [pdf, other]
Title: Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120]  arXiv:2011.00803 (cross-list from cs.SD) [pdf, other]
Title: What's All the FUSS About Free Universal Sound Separation Data?
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121]  arXiv:2011.01143 (cross-list from cs.SD) [pdf, other]
Title: Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Comments: ICLR 2021, 27 pages
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[122]  arXiv:2011.01151 (cross-list from cs.SD) [pdf, other]
Title: Optimize what matters: Training DNN-HMM Keyword Spotting Model Using End Metric
Comments: Accepted at ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[123]  arXiv:2011.01447 (cross-list from cs.SD) [pdf, other]
Title: A Two-Stage Approach to Device-Robust Acoustic Scene Classification
Comments: Submitted to ICASSP 2021. Code available: this https URL
Journal-ref: ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[124]  arXiv:2011.01460 (cross-list from cs.LG) [pdf, other]
Title: Training Wake Word Detection with Synthesized Speech Data on Confusion Words
Comments: Submitted to ICASSP 2021
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2011.01518 (cross-list from cs.SD) [pdf, other]
Title: ShaneRun System Description to VoxCeleb Speaker Recognition Challenge 2020
Authors: Shen Chen
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[126]  arXiv:2011.01561 (cross-list from cs.SD) [pdf, other]
Title: Two Heads Are Better Than One: A Two-Stage Approach for Monaural Noise Reduction in the Complex Domain
Comments: Submitted to ICASSP 2021, 5 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[127]  arXiv:2011.01709 (cross-list from cs.SD) [pdf, other]
Title: Small footprint Text-Independent Speaker Verification for Embedded Systems
Journal-ref: Acoustics, Speech and Signal Processing (ICASSP), 2021 IEEE International Conference
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[128]  arXiv:2011.01761 (cross-list from cs.LG) [pdf, other]
Title: Problems using deep generative models for probabilistic audio source separation
Journal-ref: 1st I Can't Believe It's Not Better Workshop (ICBINB @ NeurIPS 2020), Vancouver, Canada
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129]  arXiv:2011.02099 (cross-list from cs.CL) [pdf, other]
Title: Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework
Comments: Accepted at INTERSPEECH 2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130]  arXiv:2011.02110 (cross-list from cs.SD) [pdf, other]
Title: Can We Trust Deep Speech Prior?
Comments: To be published in IEEE SLT 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[131]  arXiv:2011.02126 (cross-list from cs.CL) [pdf, other]
Title: Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time
Comments: Accepted in INTERSPEECH 2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132]  arXiv:2011.02127 (cross-list from cs.CL) [pdf, other]
Title: Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Comments: Accepted in INTERSPEECH 2019
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133]  arXiv:2011.02128 (cross-list from cs.CL) [pdf, other]
Title: Cross-Lingual Machine Speech Chain for Javanese, Sundanese, Balinese, and Bataks Speech Recognition and Synthesis
Comments: Accepted in SLTU-CCURL 2020
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[134]  arXiv:2011.02131 (cross-list from cs.SD) [pdf, other]
Title: DESNet: A Multi-channel Network for Simultaneous Speech Dereverberation, Enhancement and Separation
Comments: Accepted at IEEE SLT 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135]  arXiv:2011.02160 (cross-list from cs.CL) [pdf, other]
Title: Data Augmentation for End-to-end Code-switching Speech Recognition
Comments: Accepted by SLT2021
Journal-ref: 2021 IEEE Spoken Language Technology Workshop (SLT)
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[136]  arXiv:2011.02198 (cross-list from cs.SD) [pdf, other]
Title: IEEE SLT 2021 Alpha-mini Speech Challenge: Open Datasets, Tracks, Rules and Baselines
Comments: Accepted at IEEE SLT 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[137]  arXiv:2011.02314 (cross-list from cs.SD) [pdf, other]
Title: VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech
Comments: Accepted by IEEE SLT 2021. arXiv admin note: text overlap with arXiv:2005.07025
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[138]  arXiv:2011.02329 (cross-list from cs.SD) [pdf, other]
Title: Single channel voice separation for unknown number of speakers under reverberant and noisy settings
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[139]  arXiv:2011.02678 (cross-list from cs.SD) [pdf, other]
Title: BW-EDA-EEND: Streaming End-to-End Neural Speaker Diarization for a Variable Number of Speakers
Journal-ref: Proc. IEEE ICASSP, June 2021, pp. 7193-7197
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[140]  arXiv:2011.02809 (cross-list from cs.SD) [pdf, other]
Title: Semi-supervised Learning for Singing Synthesis Timbre
Comments: 5 pages, 1 figure, submitted to ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[141]  arXiv:2011.02874 (cross-list from cs.SD) [pdf, ps, other]
Title: Influence of Event Duration on Automatic Wheeze Classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[142]  arXiv:2011.02882 (cross-list from cs.SD) [pdf, ps, other]
Title: Query Expansion System for the VoxCeleb Speaker Recognition Challenge 2020
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[143]  arXiv:2011.03028 (cross-list from cs.SD) [pdf, other]
Title: From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music
Comments: Paper submitted for publication in August 2018
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[144]  arXiv:2011.03072 (cross-list from cs.CL) [pdf, other]
Title: Alignment Restricted Streaming Recurrent Neural Network Transducer
Comments: Accepted for presentation at IEEE Spoken Language Technology Workshop (SLT) 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145]  arXiv:2011.03109 (cross-list from cs.CL) [pdf, other]
Title: Improving RNN Transducer Based ASR with Auxiliary Tasks
Comments: Accepted for publication at IEEE Spoken Language Technology Workshop (SLT), 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146]  arXiv:2011.03414 (cross-list from cs.SD) [pdf, ps, other]
Title: Robust ENF Estimation Based on Harmonic Enhancement and Maximum Weight Clique
Journal-ref: IEEE Transactions on Information Forensics and Security, 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[147]  arXiv:2011.03530 (cross-list from cs.CV) [pdf, other]
Title: Large-scale multilingual audio visual dubbing
Comments: 26 pages, 8 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148]  arXiv:2011.03568 (cross-list from cs.CL) [pdf, other]
Title: Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Comments: 6 pages including supplement, 3 figures. accepted to ICASSP 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149]  arXiv:2011.03682 (cross-list from cs.SD) [pdf, other]
Title: Non-local convolutional neural networks (nlcnn) for speaker recognition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150]  arXiv:2011.03689 (cross-list from cs.SD) [pdf, other]
Title: Detection and Evaluation of human and machine generated speech in spoofing attacks on automatic speaker verification systems
Comments: 6 pages excluding references. Paper accepted by IEEE Spoken Language Technology (SLT) 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[151]  arXiv:2011.03840 (cross-list from cs.SD) [pdf, other]
Title: Dual Application of Speech Enhancement for Automatic Speech Recognition
Comments: Accepted for publication in SLT 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[152]  arXiv:2011.03955 (cross-list from cs.SD) [pdf, other]
Title: Denoising-and-Dereverberation Hierarchical Neural Vocoder for Robust Waveform Generation
Comments: Accepted by SLT 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[153]  arXiv:2011.04004 (cross-list from cs.CL) [pdf, other]
Title: Stochastic Attention Head Removal: A simple and effective method for improving Transformer Based ASR Models
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[154]  arXiv:2011.04092 (cross-list from cs.SD) [pdf, other]
Title: Frequency Gating: Improved Convolutional Neural Networks for Speech Enhancement in the Time-Frequency Domain
Subjects: Sound (cs.SD); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[155]  arXiv:2011.04249 (cross-list from cs.SD) [pdf, other]
Title: Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition
Comments: Accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[156]  arXiv:2011.04292 (cross-list from cs.SD) [pdf, ps, other]
Title: STOI-Net: A Deep Learning based Non-Intrusive Speech Intelligibility Assessment Model
Comments: Accepted in APSIPA 2020
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[157]  arXiv:2011.04297 (cross-list from cs.SD) [pdf, other]
Title: Knowledge Distillation for Singing Voice Detection
Comments: Accepted at INTERSPEECH 2021. 5 pages, 3 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[158]  arXiv:2011.04299 (cross-list from cs.SD) [pdf, other]
Title: COVID-19 Patient Detection from Telephone Quality Speech Data
Comments: 6 pages, 7 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[159]  arXiv:2011.04491 (cross-list from cs.SD) [pdf, other]
Title: Masked Proxy Loss For Text-Independent Speaker Verification
Comments: Accepted at Interspeech 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[160]  arXiv:2011.04547 (cross-list from cs.SD) [pdf, ps, other]
Title: Data Augmentation For Children's Speech Recognition -- The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge
Comments: System description of the SLT 2021 Children Speech Recognition Challenge
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161]  arXiv:2011.04568 (cross-list from cs.SD) [pdf, ps, other]
Title: Musical analysis of Stravinski's "The Rite of Spring" based on computational methods
Comments: Audio and Music Processing Lab, 2017
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[162]  arXiv:2011.04609 (cross-list from cs.SD) [pdf, other]
Title: FRILL: A Non-Semantic Speech Embedding for Mobile Devices
Comments: Accepted to Interspeech 2021
Journal-ref: Proc. Interspeech 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[163]  arXiv:2011.04696 (cross-list from cs.SD) [pdf, other]
Title: Speaker De-identification System using Autoencoders and Adversarial Training
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[164]  arXiv:2011.04906 (cross-list from cs.CL) [pdf, other]
Title: On the Usefulness of Self-Attention for Automatic Speech Recognition with Transformers
Comments: arXiv admin note: substantial text overlap with arXiv:2005.13895
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[165]  arXiv:2011.04974 (cross-list from cs.SD) [pdf, other]
Title: Deconstruct and Reconstruct Dizi Music of the Northern School and the Southern School
Comments: Best Student Paper in The 8th Conference on Sound and Music Technology (CSMT)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[166]  arXiv:2011.05158 (cross-list from cs.SD) [pdf, other]
Title: GANterpretations
Comments: In 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[167]  arXiv:2011.05189 (cross-list from cs.SD) [pdf, other]
Title: Supervised attention for speaker recognition
Comments: SLT 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[168]  arXiv:2011.05463 (cross-list from cs.CL) [pdf, other]
Title: Deep Sound Change: Deep and Iterative Learning, Convolutional Neural Networks, and Language Change
Authors: Gašper Beguš
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[169]  arXiv:2011.05585 (cross-list from cs.LG) [pdf, other]
Title: Recognizing More Emotions with Less Data Using Self-supervised Transfer Learning
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[170]  arXiv:2011.05591 (cross-list from cs.SD) [pdf, other]
Title: Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning
Comments: Accepted by ISCSLP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[171]  arXiv:2011.06380 (cross-list from cs.SD) [pdf, other]
Title: Automatic Neural Lyrics and Melody Composition
Comments: 15 pages
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[172]  arXiv:2011.06392 (cross-list from cs.SD) [pdf, other]
Title: Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement
Comments: Preprint
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[173]  arXiv:2011.06724 (cross-list from cs.SD) [pdf, other]
Title: The SLT 2021 children speech recognition challenge: Open datasets, rules and baselines
Comments: 7 pages, 3 figures, 3 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[174]  arXiv:2011.06801 (cross-list from cs.SD) [pdf, other]
Title: A Comprehensive Survey on Deep Music Generation: Multi-level Representations, Algorithms, Evaluations, and Future Directions
Comments: 96 pages,this is a draft
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[175]  arXiv:2011.06846 (cross-list from cs.LG) [pdf, other]
Title: Low-activity supervised convolutional spiking neural networks applied to speech commands recognition
Comments: Accepted to IEEE Spoken Language Technology Workshop 2021
Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[176]  arXiv:2011.07348 (cross-list from cs.SD) [pdf, other]
Title: Communication-Cost Aware Microphone Selection For Neural Speech Enhancement with Ad-hoc Microphone Arrays
Comments: 5 pages, 4 figures, ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[177]  arXiv:2011.07430 (cross-list from cs.CV) [pdf, other]
Title: Audio-Visual Event Recognition through the lens of Adversary
Comments: 4 pages
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[178]  arXiv:2011.07442 (cross-list from cs.SD) [pdf, other]
Title: Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information
Comments: To appear in IEEE Transactions on Audio, Speech and Language Processing (TASLP)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[179]  arXiv:2011.07542 (cross-list from cs.SD) [pdf, other]
Title: Automatic and perceptual discrimination between dysarthria, apraxia of speech, and neurotypical speech
Comments: ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[180]  arXiv:2011.07546 (cross-list from cs.SD) [pdf, other]
Title: Learning Frame Similarity using Siamese networks for Audio-to-Score Alignment
Comments: Accepted at EUSIPCO 2020
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[181]  arXiv:2011.07616 (cross-list from cs.SD) [pdf, other]
Title: Unsupervised Contrastive Learning of Sound Event Representations
Comments: A 4-page version is submitted to ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[182]  arXiv:2011.07754 (cross-list from cs.CL) [pdf, other]
Title: Deep Shallow Fusion for RNN-T Personalization
Comments: To appear at SLT 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[183]  arXiv:2011.07953 (cross-list from cs.CV) [pdf, other]
Title: Shimon the Robot Film Composer and DeepScore: An LSTM for Generation of Film Scores based on Visual Analysis
Comments: Computer Simulation of Musical Creativity, 20th-22nd August, University College Dublin
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[184]  arXiv:2011.08238 (cross-list from cs.CL) [pdf, ps, other]
Title: End-to-end spoken language understanding using transformer networks and self-supervised pre-trained features
Comments: 5 pages, 3 tables and 1 figure
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[185]  arXiv:2011.08467 (cross-list from cs.SD) [pdf, other]
Title: Learn2Sing: Target Speaker Singing Voice Synthesis by learning from a Singing Teacher
Comments: 8 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[186]  arXiv:2011.08469 (cross-list from cs.SD) [pdf, other]
Title: Cascade RNN-Transducer: Syllable Based Streaming On-device Mandarin Speech Recognition with a Syllable-to-Character Converter
Comments: 7 pages, 3 figures, 5 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[187]  arXiv:2011.08477 (cross-list from cs.SD) [pdf, other]
Title: Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Authors: Yi Lei, Shan Yang, Lei Xie
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[188]  arXiv:2011.08483 (cross-list from cs.SD) [pdf, other]
Title: FoolHD: Fooling speaker identification by Highly imperceptible adversarial Disturbances
Comments: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[189]  arXiv:2011.08548 (cross-list from cs.SD) [pdf, other]
Title: Optimizing voice conversion network with cycle consistency loss of speaker identity
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[190]  arXiv:2011.08609 (cross-list from cs.SD) [pdf, other]
Title: Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Comments: Accepted to ISCSLP2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[191]  arXiv:2011.08623 (cross-list from cs.SD) [pdf, other]
Title: Adversarial Training for Multi-domain Speaker Recognition
Comments: 5 pages, 2 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[192]  arXiv:2011.08679 (cross-list from cs.SD) [pdf, other]
Title: Controllable Emotion Transfer For End-to-End Speech Synthesis
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[193]  arXiv:2011.09078 (cross-list from cs.SD) [pdf, other]
Title: Vertical-Horizontal Structured Attention for Generating Music with Chords
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[194]  arXiv:2011.09081 (cross-list from cs.SD) [pdf, other]
Title: Multi-Channel Automatic Speech Recognition Using Deep Complex Unet
Comments: 7 pages, 4 figures, IEEE SLT 2021 Technical Committee
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[195]  arXiv:2011.09143 (cross-list from cs.SD) [pdf, other]
Title: Expanding Access to Music Technology -- Rapid Prototyping Accessible Instrument Solutions For Musicians With Intellectual Disabilities
Comments: Proceedings of the International Conference on New Interfaces for Musical Expression, 2020
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[196]  arXiv:2011.09272 (cross-list from cs.CL) [pdf, other]
Title: Combining Prosodic, Voice Quality and Lexical Features to Automatically Detect Alzheimer's Disease
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[197]  arXiv:2011.09299 (cross-list from cs.SD) [pdf, other]
Title: CAA-Net: Conditional Atrous CNNs with Attention for Explainable Device-robust Acoustic Scene Classification
Comments: IEEE Transactions on Multimedia
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[198]  arXiv:2011.09301 (cross-list from cs.SD) [pdf, other]
Title: Context-aware RNNLM Rescoring for Conversational Speech Recognition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[199]  arXiv:2011.09744 (cross-list from cs.LG) [pdf, other]
Title: End-To-End Dilated Variational Autoencoder with Bottleneck Discriminative Loss for Sound Morphing -- A Preliminary Study
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[200]  arXiv:2011.09767 (cross-list from cs.SD) [pdf, other]
Title: Deep Residual Local Feature Learning for Speech Emotion Recognition
Comments: 12 pages, 5 figures, submitted for review
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[201]  arXiv:2011.10233 (cross-list from cs.SD) [pdf, other]
Title: One Shot Learning for Speech Separation
Comments: Accepted to ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[202]  arXiv:2011.10469 (cross-list from cs.LG) [pdf, other]
Title: Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[203]  arXiv:2011.10710 (cross-list from cs.SD) [pdf, other]
Title: Exploring Voice Conversion based Data Augmentation in Text-Dependent Speaker Verification
Comments: Submitted to ICASSP2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[204]  arXiv:2011.11436 (cross-list from cs.SD) [pdf, other]
Title: Speech Command Recognition in Computationally Constrained Environments with a Quadratic Self-organized Operational Layer
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[205]  arXiv:2011.11588 (cross-list from cs.CL) [pdf, other]
Title: The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
Comments: 14 pages, including references and supplementary material
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[206]  arXiv:2011.11715 (cross-list from cs.CL) [pdf, other]
Title: Multi-task Language Modeling for Improving Speech Recognition of Rare Words
Comments: Accepted to IEEE Automatic Speech Recognition and Understanding (ASRU) 2021
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[207]  arXiv:2011.11970 (cross-list from cs.SD) [pdf, other]
Title: A Novel Multimodal Music Genre Classifier using Hierarchical Attention and Convolutional Neural Network
Comments: 7 pages, 4 figures
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[208]  arXiv:2011.12022 (cross-list from cs.SD) [pdf, other]
Title: Multi-Decoder DPRNN: High Accuracy Source Counting and Separation
Comments: Project Page: this https URL Submitted to ICASSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[209]  arXiv:2011.12536 (cross-list from cs.SD) [pdf, ps, other]
Title: Vocal Tract Length Perturbation for Text-Dependent Speaker Verification with Autoregressive Prediction Coding
Authors: Achintya kr. Sarkar, Zheng-Hua Tan (Senior Member, IEEE)
Comments: Copyright (c) 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Journal-ref: IEEE Signal Processing Letters, vol. 28, pp. 364-368, 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[210]  arXiv:2011.12596 (cross-list from cs.SD) [pdf, other]
Title: MTCRNN: A multi-scale RNN for directed audio texture synthesis
Authors: M. Huzaifah, L. Wyse
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[211]  arXiv:2011.12649 (cross-list from cs.CL) [pdf, other]
Title: Neural Representations for Modeling Variation in Speech
Comments: Submitted to Journal of Phonetics
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[212]  arXiv:2011.12985 (cross-list from cs.SD) [pdf, other]
Title: FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[213]  arXiv:2011.12999 (cross-list from cs.GR) [pdf, other]
Title: Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio
Comments: Accepted at the Elsevier Computers & Graphics (C&G) 2020
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[214]  arXiv:2011.13122 (cross-list from cs.SD) [pdf, other]
Title: Real-time error correction and performance aid for MIDI instruments
Authors: Georgi Marinov
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[215]  arXiv:2011.13148 (cross-list from cs.SD) [pdf, ps, other]
Title: Streaming end-to-end multi-talker speech recognition
Comments: 5 pages, 3 figures. Accepted to IEEE Signal Processing Letters 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[216]  arXiv:2011.13320 (cross-list from cs.SD) [pdf, ps, other]
Title: Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[217]  arXiv:2011.13393 (cross-list from cs.SD) [pdf, other]
Title: Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation
Comments: Accepted by ICASSP2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[218]  arXiv:2011.13439 (cross-list from cs.CL) [pdf, other]
Title: Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training
Comments: ICASSP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[219]  arXiv:2011.13453 (cross-list from cs.SD) [pdf, other]
Title: Towards Movement Generation with Audio Features
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[220]  arXiv:2011.13645 (cross-list from cs.CE) [pdf, ps, other]
Title: Numerical and experimental study of tonal noise sources at the outlet of an isolated centrifugal fan
Subjects: Computational Engineering, Finance, and Science (cs.CE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[221]  arXiv:2011.14334 (cross-list from cs.SD) [pdf, other]
Title: Audio-visual Speech Separation with Adversarially Disentangled Visual Representation
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[222]  arXiv:2011.14336 (cross-list from cs.SD) [pdf, ps, other]
Title: An Features Extraction and Recognition Method for Underwater Acoustic Target Based on ATCNN
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[223]  arXiv:2011.14445 (cross-list from cs.SD) [pdf, other]
Title: Audio, Speech, Language, & Signal Processing for COVID-19: A Comprehensive Overview
Comments: arXiv admin note: text overlap with arXiv:2005.08579
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[224]  arXiv:2011.14885 (cross-list from cs.SD) [pdf, ps, other]
Title: Look who's not talking
Comments: SLT 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[225]  arXiv:2011.15003 (cross-list from cs.SD) [pdf, other]
Title: Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[226]  arXiv:2011.15023 (cross-list from cs.CL) [pdf, other]
Title: Transformer-Transducers for Code-Switched Speech Recognition
Comments: Accepted at ICASSP 2021
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[227]  arXiv:2011.15096 (cross-list from cs.HC) [pdf, other]
Title: A proposal and evaluation of new timbre visualisation methods for audio sample browsers
Comments: 14 pages. Personal and Ubiquitous Computing (2020)
Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 227 entries: 1-227 ]
[ showing 227 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, 2405, contact, help  (Access key information)