We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Apr 2021

[ total of 229 entries: 1-229 ]
[ showing 229 entries per page: fewer | more ]
[1]  arXiv:2104.00355 [pdf, other]
Title: Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Comments: In Proceedings of Interspeech 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2]  arXiv:2104.00437 [pdf, other]
Title: Enriched Music Representations with Multiple Cross-modal Contrastive Learning
Comments: Accepted for publication to IEEE Signal Processing Letters
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[3]  arXiv:2104.00513 [pdf, other]
Title: Auto-KWS 2021 Challenge: Task, Datasets, and Baselines
Comments: 5 pages, 2 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[4]  arXiv:2104.00528 [pdf, other]
Title: OutlierNets: Highly Compact Deep Autoencoder Network Architectures for On-Device Acoustic Anomaly Detection
Comments: 7 pages
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
[5]  arXiv:2104.00705 [pdf, other]
Title: Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6]  arXiv:2104.00732 [pdf, other]
Title: Out of a hundred trials, how many errors does your speaker verifier make?
Comments: Submitted to Interspeech 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[7]  arXiv:2104.01027 [pdf, other]
Title: Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised Pre-Training
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8]  arXiv:2104.01160 [pdf, other]
Title: PhyAug: Physics-Directed Data Augmentation for Deep Sensing Model Transfer in Cyber-Physical Systems
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[9]  arXiv:2104.01161 [pdf, ps, other]
Title: An Audio-Based Deep Learning Framework For BBC Television Programme Classification
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10]  arXiv:2104.01271 [pdf, other]
Title: PATE-AAE: Incorporating Adversarial Autoencoder into Private Aggregation of Teacher Ensembles for Spoken Command Classification
Comments: Accepted to Interspeech 2021
Journal-ref: Proc. Interspeech 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[11]  arXiv:2104.01304 [pdf, other]
Title: Diarization of Legal Proceedings. Identifying and Transcribing Judicial Speech from Recorded Court Audio
Comments: Under review for InterSpeech 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[12]  arXiv:2104.01444 [pdf, ps, other]
Title: Mixture of orthogonal sequences made from extended time-stretched pulses enables measurement of involuntary voice fundamental frequency response to pitch perturbation
Comments: 5 pages, 9 figures, submitted to Interspeech2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[13]  arXiv:2104.01778 [pdf, other]
Title: AST: Audio Spectrogram Transformer
Comments: Accepted at Interspeech 2021. Code at this https URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI)
[14]  arXiv:2104.01807 [pdf, other]
Title: StarGAN-based Emotional Voice Conversion for Japanese Phrases
Comments: Submitted to Interspeech 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[15]  arXiv:2104.01978 [pdf, other]
Title: Acted vs. Improvised: Domain Adaptation for Elicitation Approaches in Audio-Visual Emotion Recognition
Comments: paper accepted by INTERSPEECH2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[16]  arXiv:2104.02005 [pdf, other]
Title: Uncertainty-Aware COVID-19 Detection from Imbalanced Sound Data
Comments: Accepted by INTERSPEECH 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:2104.02109 [pdf, other]
Title: Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Comments: 5 pages, 2 figures, submitted to Interspeech 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL)
[18]  arXiv:2104.02207 [pdf, other]
Title: Dissecting User-Perceived Latency of On-Device E2E Speech Recognition
Comments: Proc. of Interspeech 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[19]  arXiv:2104.02232 [pdf, other]
Title: Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios
Comments: Submitted to Interspeech 2021 (under review)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20]  arXiv:2104.02306 [pdf, other]
Title: Binary Neural Network for Speaker Verification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21]  arXiv:2104.02309 [pdf, other]
Title: MuSLCAT: Multi-Scale Multi-Level Convolutional Attention Transformer for Discriminative Music Modeling on Raw Waveforms
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22]  arXiv:2104.02387 [pdf, other]
Title: Towards Consistent Hybrid HMM Acoustic Modeling
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2104.02477 [pdf, other]
Title: COVID-19 Detection in Cough, Breath and Speech using Deep Transfer Learning and Bottleneck Features
Journal-ref: Computers in Biology and Medicine, 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2104.02535 [pdf, other]
Title: Optimal Transport-based Adaptation in Dysarthric Speech Tasks
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[25]  arXiv:2104.02558 [pdf, ps, other]
Title: Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[26]  arXiv:2104.02868 [pdf, other]
Title: Darts-Conformer: Towards Efficient Gradient-Based Neural Architecture Search For End-to-End ASR
Comments: Submitted to ASRU 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[27]  arXiv:2104.03204 [pdf, other]
Title: Learning robust speech representation with an articulatory-regularized variational autoencoder
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[28]  arXiv:2104.03502 [pdf, other]
Title: Emotion Recognition from Speech Using Wav2vec 2.0 Embeddings
Comments: 5 pages, 2 figures. Submitted to Interspeech 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29]  arXiv:2104.03521 [pdf, other]
Title: Towards Multi-Scale Style Control for Expressive Speech Synthesis
Comments: 5 pages, 4 figures, submitted to INTERSPEECH 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[30]  arXiv:2104.03538 [pdf]
Title: MetricGAN+: An Improved Version of MetricGAN for Speech Enhancement
Comments: Accepted by Interspeech 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[31]  arXiv:2104.03587 [pdf, other]
Title: WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[32]  arXiv:2104.03603 [pdf, other]
Title: AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
Comments: Accepted by Interspeech 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2104.03617 [pdf, other]
Title: Half-Truth: A Partially Fake Audio Detection Dataset
Comments: submitted to Interspeech 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[34]  arXiv:2104.03838 [pdf, other]
Title: Speech Denoising Without Clean Training Data: A Noise2Noise Approach
Comments: Published in Interspeech 2021 ( See this https URL ). 5 pages, 2 figures, 1 table
Journal-ref: Proc. Interspeech 2021, 2716-2720
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35]  arXiv:2104.03876 [pdf, other]
Title: SerumRNN: Step by Step Audio VST Effect Programming
Comments: Audio samples of the system can be listened to at bit.ly/serum_rnn
Journal-ref: 10th International Conference on Artificial Intelligence in Music, Sound, Art, and Design (EvoMUSART 2021), Seville, Spain
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36]  arXiv:2104.04050 [pdf, other]
Title: Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features
Comments: 5
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG)
[37]  arXiv:2104.04111 [pdf, other]
Title: Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Comments: Camera ready version. Accepted by INTERSPEECH 2021
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[38]  arXiv:2104.04143 [pdf, other]
Title: Heaps' Law and Vocabulary Richness in the History of Classical Music Harmony
Comments: 12 pages
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Physics and Society (physics.soc-ph)
[39]  arXiv:2104.04325 [pdf, other]
Title: Joint Online Multichannel Acoustic Echo Cancellation, Speech Dereverberation and Source Separation
Comments: submitted to INTERSPEECH 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40]  arXiv:2104.04598 [pdf, other]
Title: Cross-Modal learning for Audio-Visual Video Parsing
Comments: Work accepted at Interspeech 2021
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[41]  arXiv:2104.04668 [pdf, other]
Title: Unified Source-Filter GAN: Unified Source-filter Network Based On Factorization of Quasi-Periodic Parallel WaveGAN
Comments: Submitted to INTERSPEECH 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[42]  arXiv:2104.04702 [pdf, other]
Title: Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR
Comments: 5 pages,4 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43]  arXiv:2104.05657 [pdf, other]
Title: End-to-End Mandarin Tone Classification with Short Term Context Information
Authors: Jiyang Tang, Ming Li
Comments: Accepted by APSIPA ASC 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44]  arXiv:2104.05784 [pdf, other]
Title: Extremely Low Footprint End-to-End ASR System for Smart Device
Comments: 5 pages, 2 figures, accepted by INTERSPEECH 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45]  arXiv:2104.06004 [pdf, other]
Title: Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Lexical Information Fusion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[46]  arXiv:2104.06074 [pdf, other]
Title: NoiseVC: Towards High Quality Zero-Shot Voice Conversion
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47]  arXiv:2104.06162 [pdf, other]
Title: Visually Informed Binaural Audio Generation without Binaural Audios
Comments: Accepted by CVPR 2021. Code, models, and demo video are available on our webpage: \<this https URL>
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[48]  arXiv:2104.06517 [pdf, other]
Title: Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition
Comments: AAAI Workshop on Affective Content Analysis 2021 Camera Ready Version
Journal-ref: AAAI 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[49]  arXiv:2104.06607 [pdf, other]
Title: Revisiting the Onsets and Frames Model with Additive Attention
Comments: Accepted in IJCNN 2021 Special Session S04. this https URL
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2104.06666 [pdf, other]
Title: End-to-end Keyword Spotting using Neural Architecture Search and Quantization
Comments: arXiv admin note: text overlap with arXiv:2012.10138
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[51]  arXiv:2104.06793 [pdf, other]
Title: Non-autoregressive sequence-to-sequence voice conversion
Comments: Accepted to ICASSP2021. Demo HP: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[52]  arXiv:2104.06865 [pdf, other]
Title: Efficient conformer-based speech recognition with linear attention
Comments: submitted to APSIPA ASC 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[53]  arXiv:2104.06900 [pdf, ps, other]
Title: FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[54]  arXiv:2104.07128 [pdf, ps, other]
Title: Audio feature ranking for sound-based COVID-19 patient detection
Comments: 22 pages, 6 figures, 8 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[55]  arXiv:2104.07161 [pdf, other]
Title: On the Design of Deep Priors for Unsupervised Audio Restoration
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[56]  arXiv:2104.07286 [pdf, other]
Title: Continual Learning for Fake Audio Detection
Comments: 5 pages, conference
Journal-ref: Proc. Interspeech 2021, 886-890
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[57]  arXiv:2104.07491 [pdf, other]
Title: Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching
Comments: Accepted to INTERSPEECH 2021; code available at this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[58]  arXiv:2104.07519 [pdf, other]
Title: Spectrogram Inpainting for Interactive Generation of Instrument Sounds
Comments: 8 pages + references + appendices. 4 figures. Published as a conference paper at the The 2020 Joint Conference on AI Music Creativity, October 19-23, 2020, organized and hosted virtually by the Royal Institute of Technology (KTH), Stockholm, Sweden
Journal-ref: Proceedings of the 1st Joint Conference on AI Music Creativity, 2020 (p. 10). Stockholm, Sweden: AIMC
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[59]  arXiv:2104.08450 [pdf, other]
Title: MIMO Self-attentive RNN Beamformer for Multi-speaker Speech Separation
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[60]  arXiv:2104.08580 [pdf, other]
Title: Uncovering audio patterns in music with Nonnegative Tucker Decomposition for structural segmentation
Authors: Axel Marmoret (1), Jérémy E. Cohen (1), Nancy Bertin (1), Frédéric Bimbot (1) ((1) Univ Rennes, Inria, CNRS, IRISA, France.)
Comments: 7 pages, 6 figures; Code and experiments details available at this https URL; Experiments details available at this https URL
Journal-ref: 21st International Society for Music Information Retrieval Conference (ISMIR), Montr\'eal, Canada, 2020, 788-794
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[61]  arXiv:2104.08614 [pdf]
Title: Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[62]  arXiv:2104.08806 [pdf, other]
Title: Best Practices for Noise-Based Augmentation to Improve the Performance of Emotion Recognition "In the Wild"
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[63]  arXiv:2104.08872 [pdf, ps, other]
Title: Low-Frequency Characterization of Music Sounds -- Ultra-Bass Richness from the Sound Wave Beats
Comments: 23 pages, 7 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); General Physics (physics.gen-ph)
[64]  arXiv:2104.08955 [pdf, other]
Title: Many-Speakers Single Channel Speech Separation with Optimal Permutation Training
Comments: Accepted to Interspeech 2021, Data creation link added
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[65]  arXiv:2104.09018 [pdf, other]
Title: An Interdisciplinary Review of Music Performance Analysis
Comments: arXiv admin note: substantial text overlap with arXiv:1907.00178
Journal-ref: Transactions of the International Society for Music Information Retrieval, 3(1), pp.221-245, 2020
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[66]  arXiv:2104.09489 [pdf, other]
Title: Interpreting intermediate convolutional layers of generative CNNs trained on waveforms
Comments: IEEE/ACM Transactions on Audio Speech and Language Processing
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[67]  arXiv:2104.09715 [pdf, other]
Title: AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
Comments: Accepted by ICASSP 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[68]  arXiv:2104.09748 [pdf, other]
Title: Waveform Phasicity Prediction from Arterial Sounds through Spectrogram Analysis using Convolutional Neural Networks for Limb Perfusion Assessment
Comments: 5 pages, 8 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[69]  arXiv:2104.09832 [pdf]
Title: Identification of fake stereo audio
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[70]  arXiv:2104.09946 [pdf, other]
Title: A cappella: Audio-visual Singing Voice Separation
Comments: Paper accepted at The 32nd British Machine Vision Conference, BMVC 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[71]  arXiv:2104.09995 [pdf, other]
Title: Review of end-to-end speech synthesis technology based on deep learning
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[72]  arXiv:2104.10121 [pdf, other]
Title: On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era
Authors: Shahin Amiriparian (1), Artem Sokolov (2,3), Ilhan Aslan (2), Lukas Christ (1), Maurice Gerczuk (1), Tobias Hübner (1), Dmitry Lamanov (2), Manuel Milling (1), Sandra Ottl (1), Ilya Poduremennykh (2), Evgeniy Shuranov (2,4), Björn W. Schuller (1,5) ((1) EIHW -- Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany, (2) Huawei Technologies, (3) HSE University, Nizhniy Novgorod, Russia, (4) ITMO University, Saint Petersburg, Russia)
Comments: 5 pages, 1 figure
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[73]  arXiv:2104.10431 [pdf, other]
Title: Room adaptive conditioning method for sound event classification in reverberant environments
Comments: 5 pages, 3 figures, In Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2104.11051 [pdf, other]
Title: Protecting gender and identity with disentangled speech representations
Comments: 5 pages, 2 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[75]  arXiv:2104.11347 [pdf, ps, other]
Title: Restoring degraded speech via a modified diffusion model
Journal-ref: Proc. Interspeech 2021, 221-225, 2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[76]  arXiv:2104.11395 [pdf]
Title: Infant Vocal Tract Development Analysis and Diagnosis by Cry Signals with CNN Age Classification
Authors: Chunyan Ji, Yi Pan
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[77]  arXiv:2104.11532 [pdf, ps, other]
Title: 3D Convolutional Neural Networks for Ultrasound-Based Silent Speech Interfaces
Comments: 10 pages, 2 tables , 3 figures
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[78]  arXiv:2104.11587 [pdf, other]
Title: ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio
Comments: submitted IJCNN 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79]  arXiv:2104.11598 [pdf, ps, other]
Title: Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders
Comments: 6 pages. 4 tables, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80]  arXiv:2104.11601 [pdf, ps, other]
Title: Improving Neural Silent Speech Interface Models by Adversarial Training
Comments: 11 pages, 3 tables, 2 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81]  arXiv:2104.11629 [pdf, other]
Title: DeepSpectrumLite: A Power-Efficient Transfer Learning Framework for Embedded Speech and Audio Processing from Decentralised Data
Authors: Shahin Amiriparian (1), Tobias Hübner (1), Maurice Gerczuk (1), Sandra Ottl (1), Björn W. Schuller (1,2) ((1) EIHW -- Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany, (2) GLAM -- Group on Language, Audio, and Music, Imperial College London, UK)
Comments: 5 pages, 1 figure
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[82]  arXiv:2104.11673 [pdf, other]
Title: Deep Learning Based Assessment of Synthetic Speech Naturalness
Comments: Late upload, presented at Interspeech 2020
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[83]  arXiv:2104.11710 [pdf, other]
Title: Beyond Voice Activity Detection: Hybrid Audio Segmentation for Direct Speech Translation
Comments: Accepted to ICNLSP 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[84]  arXiv:2104.11880 [pdf, other]
Title: Music Embedding: A Tool for Incorporating Music Theory into Computational Music Applications
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[85]  arXiv:2104.11984 [pdf, other]
Title: MusCaps: Generating Captions for Music Audio
Comments: Accepted to IJCNN 2021 for the Special Session on Representation Learning for Audio, Speech, and Music Processing
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[86]  arXiv:2104.12159 [pdf, other]
Title: An Adaptive Learning based Generative Adversarial Network for One-To-One Voice Conversion
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[87]  arXiv:2104.12292 [pdf, other]
Title: Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Comments: In the proceedings of ISCA Speech Synthesis Workshop 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[88]  arXiv:2104.12359 [pdf, other]
Title: Complex Neural Spatial Filter: Enhancing Multi-channel Target Speech Separation in Complex Domain
Comments: 5 pages, 3 figures
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89]  arXiv:2104.12432 [pdf, ps, other]
Title: Generation of musical patterns through operads
Authors: Samuele Giraudo
Comments: 10 pages
Journal-ref: Journ\'ees d'informatique musicale, 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Combinatorics (math.CO)
[90]  arXiv:2104.12462 [pdf, other]
Title: Points2Sound: From mono to binaural audio using 3D point cloud scenes
Comments: Code, data, and listening examples: this https URL
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[91]  arXiv:2104.12693 [pdf, other]
Title: Identifying Actions for Sound Event Classification
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92]  arXiv:2104.12807 [pdf, other]
Title: Multimodal Self-Supervised Learning of General Audio Representations
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93]  arXiv:2104.12922 [pdf, other]
Title: One Billion Audio Sounds from GPU-enabled Modular Synthesis
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[94]  arXiv:2104.13002 [pdf, other]
Title: DPT-FSNet: Dual-path Transformer Based Full-band and Sub-band Fusion Network for Speech Enhancement
Comments: Accepted by ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95]  arXiv:2104.13040 [pdf, ps, other]
Title: The music box operad: Random generation of musical phrases from patterns
Authors: Samuele Giraudo
Comments: 31 pages. Extended version of arXiv:2104.12432
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Combinatorics (math.CO); Quantum Algebra (math.QA)
[96]  arXiv:2104.13056 [pdf, other]
Title: Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework
Comments: Accepted for the International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18-22 July 2021 (virtual)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[97]  arXiv:2104.13266 [pdf, other]
Title: Batebit Controller: Popularizing Digital Musical Instruments Development Process
Comments: 2 pages, 2 figures, 17th Brazilian Symposium on Computer Music
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[98]  arXiv:2104.13276 [pdf, other]
Title: MULTIMODAL ANALYSIS: Informed content estimation and audio source separation
Comments: Ph.D. dissertation. Thesis supervisor: Geoffroy Peeters. Jury:Laurent Girin, Ga\"el Richard, Rachel Bittner, Elena Cabrio, Bruno Gas, Perfecto Herrera Boyer, Antoine Liutkus
Subjects: Sound (cs.SD); Databases (cs.DB); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[99]  arXiv:2104.14067 [pdf]
Title: Improving Fairness in Speaker Recognition
Comments: Accepted at the 2020 European Symposium on Software Engineering (ESSE 2020)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[100]  arXiv:2104.14297 [pdf, other]
Title: End-to-End Speech Recognition from Federated Acoustic Models
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[101]  arXiv:2104.14468 [pdf, other]
Title: Star DGT: a Robust Gabor Transform for Speech Denoising
Comments: arXiv admin note: text overlap with arXiv:2103.11233
Subjects: Sound (cs.SD); Information Theory (cs.IT); Audio and Speech Processing (eess.AS)
[102]  arXiv:2104.00239 (cross-list from cs.CV) [pdf, other]
Title: Positive Sample Propagation along the Audio-Visual Event Line
Comments: Accepted to CVPR 2021. Code is available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[103]  arXiv:2104.00315 (cross-list from cs.CV) [pdf, other]
Title: Unsupervised Sound Localization via Iterative Contrastive Learning
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[104]  arXiv:2104.00824 (cross-list from cs.CL) [pdf]
Title: Tusom2021: A Phonetically Transcribed Speech Dataset from an Endangered Language for Universal Phone Recognition Experiments
Comments: 4 pages, 3 figures
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[105]  arXiv:2104.01378 (cross-list from cs.CL) [pdf, other]
Title: speechocean762: An Open-Source Non-native English Speech Corpus For Pronunciation Assessment
Comments: Accepted in INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[106]  arXiv:2104.02000 (cross-list from cs.CV) [pdf, other]
Title: Can audio-visual integration strengthen robustness under multimodal attacks?
Comments: CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[107]  arXiv:2104.02026 (cross-list from cs.CV) [pdf, other]
Title: Cyclic Co-Learning of Sounding Object Visual Grounding and Sound Separation
Comments: CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[108]  arXiv:2104.02410 (cross-list from cs.SE) [pdf, other]
Title: Using Voice and Biofeedback to Predict User Engagement during Requirements Interviews
Comments: This paper contains updated experimental results with respect to the initial version
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[109]  arXiv:2104.02588 (cross-list from cs.CE) [pdf]
Title: Principal Component Analysis Applied to Gradient Fields in Band Gap Optimization Problems for Metamaterials
Subjects: Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[110]  arXiv:2104.02606 (cross-list from cs.CV) [pdf, other]
Title: Weakly-supervised Audio-visual Sound Source Detection and Separation
Comments: 4 figures, 6 pages
Journal-ref: IEEE International Conference on Multimedia and Expo (ICME) 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[111]  arXiv:2104.02656 (cross-list from cs.CV) [pdf, other]
Title: Collaborative Learning to Generate Audio-Video Jointly
Comments: ICASSP 2021 (Accepted)
Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[112]  arXiv:2104.02775 (cross-list from cs.CV) [pdf, other]
Title: Looking into Your Speech: Learning Cross-modal Affinity for Audio-visual Speech Separation
Comments: CVPR 2021. The first two authors contributed equally to this work. Project page: this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[113]  arXiv:2104.03123 (cross-list from cs.LG) [pdf, other]
Title: Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection
Comments: Accepted to INTERSPEECH 2021
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[114]  arXiv:2104.03725 (cross-list from cs.LG) [pdf, other]
Title: On tuning consistent annealed sampling for denoising score matching
Comments: 3 pages and 1 figure
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[115]  arXiv:2104.03815 (cross-list from cs.CL) [pdf, other]
Title: Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[116]  arXiv:2104.03842 (cross-list from cs.CL) [pdf, other]
Title: RNN Transducer Models For Spoken Language Understanding
Comments: To appear in the proceedings of ICASSP 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[117]  arXiv:2104.04091 (cross-list from cs.CL) [pdf, other]
Title: Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects
Comments: 5
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[118]  arXiv:2104.04487 (cross-list from cs.CL) [pdf, other]
Title: Language model fusion for streaming end to end speech recognition
Comments: 5 pages
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[119]  arXiv:2104.04552 (cross-list from cs.CL) [pdf, other]
Title: Lookup-Table Recurrent Language Models for Long Tail Speech Recognition
Comments: Presented as conference paper at Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[120]  arXiv:2104.04805 (cross-list from cs.CL) [pdf]
Title: Non-autoregressive Transformer-based End-to-end ASR using BERT
Journal-ref: in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 1474-1482, 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[121]  arXiv:2104.04950 (cross-list from cs.CL) [pdf]
Title: Innovative Bert-based Reranking Language Models for Speech Recognition
Comments: 6 pages, 3 figures, Published in IEEE SLT 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[122]  arXiv:2104.05055 (cross-list from cs.CL) [pdf, other]
Title: NeMo Inverse Text Normalization: From Development To Production
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[123]  arXiv:2104.05418 (cross-list from cs.LG) [pdf, other]
Title: Contrastive Learning of Global-Local Video Representations
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[124]  arXiv:2104.05488 (cross-list from cs.CL) [pdf, other]
Title: CNN Encoding of Acoustic Parameters for Prominence Detection
Comments: 5 pages, 2 figures, 6 tables, Submitted to INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[125]  arXiv:2104.05507 (cross-list from cs.CL) [pdf, other]
Title: BART based semantic correction for Mandarin automatic speech recognition system
Comments: submitted to INTERSPEECH2021
Journal-ref: Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[126]  arXiv:2104.05544 (cross-list from cs.CL) [pdf, ps, other]
Title: Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models
Comments: accepted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[127]  arXiv:2104.05752 (cross-list from cs.CL) [pdf, other]
Title: Speak or Chat with Me: End-to-End Spoken Language Understanding System with Flexible Inputs
Comments: Accepted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[128]  arXiv:2104.05980 (cross-list from cs.CL) [pdf, other]
Title: Experiments of ASR-based mispronunciation detection for children and adult English learners
Comments: Submitted to INTERSPEECH2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[129]  arXiv:2104.06457 (cross-list from cs.CL) [pdf, other]
Title: Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation
Comments: Accepted at NAACL-HLT 2021 (short paper)
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[130]  arXiv:2104.06835 (cross-list from cs.CL) [pdf, other]
Title: Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Comments: Submitted to Interspeech 2022
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[131]  arXiv:2104.07253 (cross-list from cs.CL) [pdf, other]
Title: Integration of Pre-trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding
Comments: Accepted for ICASSP 2022
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[132]  arXiv:2104.08510 (cross-list from cs.MM) [pdf, other]
Title: Exploring Deep Learning for Joint Audio-Visual Lip Biometrics
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[133]  arXiv:2104.09641 (cross-list from cs.LG) [pdf, ps, other]
Title: A New Class of Efficient Adaptive Filters for Online Nonlinear Modeling
Comments: This work has been accepted for publication in IEEE Transactions on Systems, Man, and Cybernetics: Systems. Copyright may be transferred without notice, after which this version may no longer be accessible
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP); Systems and Control (eess.SY)
[134]  arXiv:2104.10116 (cross-list from cs.MM) [pdf, other]
Title: Detection of Audio-Video Synchronization Errors Via Event Detection
Comments: To be published in ICASSP 2021
Subjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[135]  arXiv:2104.10299 (cross-list from cs.GR) [pdf, other]
Title: Voice2Mesh: Cross-Modal 3D Face Model Generation from Voices
Comments: Project page: this https URL
Subjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[136]  arXiv:2104.10507 (cross-list from cs.CL) [pdf, ps, other]
Title: On Sampling-Based Training Criteria for Neural Language Modeling
Comments: Accepted at INTERSPEECH 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[137]  arXiv:2104.10747 (cross-list from cs.CL) [pdf, ps, other]
Title: Accented Speech Recognition: A Survey
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[138]  arXiv:2104.11070 (cross-list from cs.CL) [pdf, other]
Title: Adapting Long Context NLM for ASR Rescoring in Conversational Agents
Comments: Accepted to Interspeech 2021. arXiv admin note: text overlap with arXiv:2103.10325
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[139]  arXiv:2104.11116 (cross-list from cs.CV) [pdf, other]
Title: Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation
Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021. Code and models are available at this https URL
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[140]  arXiv:2104.11127 (cross-list from cs.CL) [pdf, other]
Title: Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network
Authors: Janne Pylkkönen (1), Antti Ukkonen (1 and 2), Juho Kilpikoski (1), Samu Tamminen (1), Hannes Heikinheimo (1) ((1) Speechly, (2) Department of Computer Science, University of Helsinki, Finland)
Comments: 5 pages, 2 figures. Accepted to Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[141]  arXiv:2104.11348 (cross-list from cs.CL) [pdf, other]
Title: Earnings-21: A Practical Benchmark for ASR in the Wild
Comments: Accepted to INTERSPEECH 2021. June 15 2021: Addressing the comments of reviewers and updating the results of our internal ESPNet model. The results do not change our conclusions. April 28th, 2021: We found and resolved an issue in our experimental evaluation that scored the LibriSpeech model at ~20% worse relative WER than the actual WER. The updated results do not affect our conclusions
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[142]  arXiv:2104.11462 (cross-list from cs.CL) [pdf, ps, other]
Title: LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Comments: Will be presented at Interspeech 2021
Journal-ref: Proc. Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[143]  arXiv:2104.11946 (cross-list from cs.LG) [pdf, other]
Title: Aligned Contrastive Predictive Coding
Comments: Published in Interspeech 2021
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[144]  arXiv:2104.13225 (cross-list from cs.AI) [pdf, other]
Title: Visually grounded models of spoken language: A survey of datasets, architectures and evaluation techniques
Journal-ref: Journal of Artificial Intelligence Research 73 (2022) 673-707
Subjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[145]  arXiv:2104.13332 (cross-list from cs.LG) [pdf, other]
Title: End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Comments: Published in IEEE Transactions on Cybernetics (April 2022)
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[146]  arXiv:2104.14346 (cross-list from cs.CL) [pdf, other]
Title: Bridging the gap between streaming and non-streaming ASR systems bydistilling ensembles of CTC and RNN-T models
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[147]  arXiv:2104.14470 (cross-list from cs.CL) [pdf, other]
Title: Impact of Encoding and Segmentation Strategies on End-to-End Simultaneous Speech Translation
Comments: Accepted for presentation at Interspeech 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[148]  arXiv:2104.14802 (cross-list from cs.MM) [pdf, other]
Title: Dance Generation with Style Embedding: Learning and Transferring Latent Representations of Dance Styles
Comments: Submit to IJCAI-21
Subjects: Multimedia (cs.MM); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[149]  arXiv:2104.14830 (cross-list from cs.CL) [pdf, other]
Title: Scaling End-to-End Models for Large-Scale Multilingual ASR
Comments: ASRU 2021
Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[150]  arXiv:2104.00120 (cross-list from eess.AS) [pdf, ps, other]
Title: Multi-Encoder Learning and Stream Fusion for Transformer-Based End-to-End Automatic Speech Recognition
Comments: accepted at INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[151]  arXiv:2104.00259 (cross-list from eess.AS) [pdf, other]
Title: Interactive spatial speech recognition maps based on simulated speech recognition experiments
Comments: 16 pages, 11 figures, related code this https URL
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[152]  arXiv:2104.00769 (cross-list from eess.AS) [pdf, other]
Title: Keyword Transformer: A Self-Attention Model for Keyword Spotting
Comments: Proceedings of INTERSPEECH
Journal-ref: Proc. Interspeech 2021, 4249-4253
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[153]  arXiv:2104.00931 (cross-list from eess.AS) [pdf, other]
Title: Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[154]  arXiv:2104.00960 (cross-list from eess.AS) [pdf, other]
Title: INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing
Comments: 5 pages, submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[155]  arXiv:2104.00994 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation
Comments: Accepted for publication in INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[156]  arXiv:2104.01227 (cross-list from eess.AS) [pdf, other]
Title: MetricNet: Towards Improved Modeling For Non-Intrusive Speech Quality Assessment
Comments: Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[157]  arXiv:2104.01320 (cross-list from eess.AS) [pdf, other]
Title: An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems
Comments: 5 pages, 6 figures, in Proc. INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[158]  arXiv:2104.01409 (cross-list from eess.AS) [pdf, other]
Title: Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[159]  arXiv:2104.01541 (cross-list from eess.AS) [pdf, other]
Title: Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[160]  arXiv:2104.01923 (cross-list from eess.SP) [pdf, other]
Title: Real-time Streaming Wave-U-Net with Temporal Convolutions for Multichannel Speech Enhancement
Comments: Draft paper for InterSpeech 2021 processing
Subjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[161]  arXiv:2104.01954 (cross-list from eess.AS) [pdf, other]
Title: Reformulating DOVER-Lap Label Mapping as a Graph Partitioning Problem
Comments: 5 pages, 3 figures. Acceped at INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[162]  arXiv:2104.02017 (cross-list from eess.AS) [pdf, other]
Title: Efficient Personalized Speech Enhancement through Self-Supervised Learning
Comments: 15 pages, 9 figures, published in IEEE JSTSP 2022
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[163]  arXiv:2104.02018 (cross-list from eess.AS) [pdf, other]
Title: Personalized Speech Enhancement through Self-Supervised Data Augmentation and Purification
Comments: 5 pages, 3 figures, under review
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[164]  arXiv:2104.02125 (cross-list from eess.AS) [pdf, other]
Title: SpeakerStew: Scaling to Many Languages with a Triaged Multilingual Text-Dependent and Text-Independent Speaker Verification System
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[165]  arXiv:2104.02128 (cross-list from eess.AS) [pdf, other]
Title: End-to-End Speaker-Attributed ASR with Transformer
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[166]  arXiv:2104.02518 (cross-list from eess.AS) [pdf, other]
Title: An Initial Investigation for Detecting Partially Spoofed Audio
Comments: INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[167]  arXiv:2104.02724 (cross-list from eess.AS) [pdf, other]
Title: Relaxing the Conditional Independence Assumption of CTC-based ASR by Conditioning on Intermediate Predictions
Comments: Accepted to INTERSPEECH2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[168]  arXiv:2104.02757 (cross-list from eess.AS) [pdf, other]
Title: Exploring Targeted Universal Adversarial Perturbations to End-to-end ASR Models
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[169]  arXiv:2104.02819 (cross-list from eess.AS) [pdf, other]
Title: Learning to Rank Microphones for Distant Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[170]  arXiv:2104.02858 (cross-list from eess.AS) [pdf, other]
Title: Capturing Multi-Resolution Context by Dilated Self-Attention
Comments: In Proc. ICASSP 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[171]  arXiv:2104.02878 (cross-list from eess.AS) [pdf, other]
Title: Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
Comments: 5 pages, 2 figures, 4 tables, submitted to Interspeech as a conference paper
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[172]  arXiv:2104.02879 (cross-list from eess.AS) [pdf, other]
Title: Adapting Speaker Embeddings for Speaker Diarisation
Comments: 5 pages, 2 figures, 3 tables, submitted to Interspeech as a conference paper
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[173]  arXiv:2104.02882 (cross-list from eess.AS) [pdf, other]
Title: FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization
Comments: Submitted to INTERSPEECH2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[174]  arXiv:2104.02901 (cross-list from eess.AS) [pdf, other]
Title: S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations
Comments: Accepted by INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[175]  arXiv:2104.03004 (cross-list from eess.AS) [pdf, ps, other]
Title: Siamese Neural Network with Joint Bayesian Model Structure for Speaker Verification
Comments: arXiv admin note: substantial text overlap with arXiv:2101.03329
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[176]  arXiv:2104.03009 (cross-list from eess.AS) [pdf, other]
Title: The AS-NU System for the M2VoC Challenge
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[177]  arXiv:2104.03017 (cross-list from eess.AS) [pdf, other]
Title: Utilizing Self-supervised Representations for MOS Prediction
Comments: In Proceedings of Interspeech 2021. We acknowledge the support of AWS Machine Learning Research Awards program. Source code available at this https URL
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[178]  arXiv:2104.03074 (cross-list from eess.AS) [pdf, ps, other]
Title: Audio declipping performance enhancement via crossfading
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[179]  arXiv:2104.03416 (cross-list from eess.AS) [pdf, ps, other]
Title: Pushing the Limits of Non-Autoregressive Speech Recognition
Comments: Proceedings of INTERSPEECH
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[180]  arXiv:2104.03654 (cross-list from eess.AS) [pdf, other]
Title: Graph Attention Networks for Anti-Spoofing
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[181]  arXiv:2104.03759 (cross-list from eess.AS) [pdf, other]
Title: Phoneme-based Distribution Regularization for Speech Enhancement
Comments: ICASSP 2021 (Accepted)
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[182]  arXiv:2104.03899 (cross-list from eess.AS) [pdf, other]
Title: Unsupervised Speech Representation Learning for Behavior Modeling using Triplet Enhanced Contextualized Networks
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[183]  arXiv:2104.04045 (cross-list from eess.AS) [pdf, other]
Title: End-to-end speaker segmentation for overlap-aware resegmentation
Comments: Camera-ready version for Interspeech 2021 with significantly better voice activity detection, overlapped speech detection, and speaker diarization results. The code used for results reported in v1 contained a small bug that has now been fixed
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[184]  arXiv:2104.04298 (cross-list from eess.AS) [pdf, other]
Title: On Architectures and Training for Raw Waveform Feature Extraction in ASR
Comments: Accepted for ASRU 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[185]  arXiv:2104.04627 (cross-list from eess.AS) [pdf, other]
Title: Accented Speech Recognition Inspired by Human Perception
Authors: Xiangyun Chu (1), Elizabeth Combs (1), Amber Wang (1), Michael Picheny (2) ((1) Center for Data Science, New York University, (2) Courant Computer Science and Center for Data Science, New York University)
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[186]  arXiv:2104.04896 (cross-list from eess.AS) [pdf]
Title: A Toolbox for Construction and Analysis of Speech Datasets
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[187]  arXiv:2104.05017 (cross-list from eess.AS) [pdf, other]
Title: Estimating articulatory movements in speech production with transformer networks
Comments: accepted for oral presentation at INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[188]  arXiv:2104.05079 (cross-list from eess.AS) [pdf, other]
Title: Comparison of Binaural RTF-Vector-Based Direction of Arrival Estimation Methods Exploiting an External Microphone
Comments: \c{opyright} 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[189]  arXiv:2104.05267 (cross-list from eess.AS) [pdf, other]
Title: Complex Spectral Mapping With Attention Based Convolution Recurrent Neural Network for Speech Enhancement
Comments: Interspeech2021 submitted
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[190]  arXiv:2104.05390 (cross-list from eess.AS) [pdf, other]
Title: Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search
Comments: submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[191]  arXiv:2104.05481 (cross-list from eess.AS) [pdf, other]
Title: Improvement of Noise-Robust Single-Channel Voice Activity Detection with Spatial Pre-processing
Comments: Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[192]  arXiv:2104.05499 (cross-list from eess.AS) [pdf, ps, other]
Title: L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing
Comments: Documentation paper for the L3DAS21 Challenge for IEEE MLSP 2021. Further information on www.l3das.com/mlsp2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[193]  arXiv:2104.05557 (cross-list from eess.AS) [pdf, other]
Title: SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Comments: Accepted on Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[194]  arXiv:2104.06604 (cross-list from eess.AS) [pdf, other]
Title: Learning Metrics from Mean Teacher: A Supervised Learning Method for Improving the Generalization of Speaker Verification System
Comments: 5 pages, 1 figures, 5 tables, submitted to 2021 Interspeech as a conference paper
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[195]  arXiv:2104.06798 (cross-list from eess.AS) [pdf, ps, other]
Title: Audio-based cough counting using independent subspace analysis
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[196]  arXiv:2104.07283 (cross-list from eess.AS) [pdf, other]
Title: Towards end-to-end F0 voice conversion based on Dual-GAN with convolutional wavelet kernels
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[197]  arXiv:2104.07288 (cross-list from eess.AS) [pdf, other]
Title: Speaker Attentive Speech Emotion Recognition
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[198]  arXiv:2104.07310 (cross-list from eess.AS) [pdf, other]
Title: Investigating the Utility of Multimodal Conversational Technology and Audiovisual Analytic Measures for the Assessment and Monitoring of Amyotrophic Lateral Sclerosis at Scale
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[199]  arXiv:2104.07326 (cross-list from eess.AS) [pdf, other]
Title: EnvGAN: Adversarial Synthesis of Environmental Sounds for Data Augmentation
Comments: Submitted to IEEE Transactions on Audio, Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[200]  arXiv:2104.07388 (cross-list from eess.AS) [pdf, other]
Title: Conditional independence for pretext task selection in Self-supervised speech representation learning
Comments: 5 pages, Accepted for presentation at Interspeech2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[201]  arXiv:2104.07474 (cross-list from eess.AS) [pdf, other]
Title: EAT: Enhanced ASR-TTS for Self-supervised Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[202]  arXiv:2104.08459 (cross-list from eess.AS) [pdf, other]
Title: KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
Comments: 5 pages, 4 tables, 2 figures, accepted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[203]  arXiv:2104.08499 (cross-list from eess.AS) [pdf, other]
Title: Multi-Metric Optimization using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement
Comments: Accepted to IEEE/ACM Transactions on Audio Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[204]  arXiv:2104.09064 (cross-list from eess.AS) [pdf, other]
Title: Automatic Stroke Classification of Tabla Accompaniment in Hindustani Vocal Concert Audio
Comments: To appear in the JOURNAL OF ACOUSTICAL SOCIETY OF INDIA, April 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[205]  arXiv:2104.09356 (cross-list from eess.AS) [pdf, other]
Title: Detecting cognitive decline using speech only: The ADReSSo Challenge
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[206]  arXiv:2104.09456 (cross-list from eess.AS) [pdf, other]
Title: Self-supervised Representation Learning With Path Integral Clustering For Speaker Diarization
Comments: 11 pages, Accepted in IEEE Transactions on Audio, Speech and Language Processing
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[207]  arXiv:2104.09482 (cross-list from eess.AS) [pdf, other]
Title: Fusing information streams in end-to-end audio-visual speech recognition
Comments: 5 pages
Journal-ref: Published in International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[208]  arXiv:2104.09494 (cross-list from eess.AS) [pdf, other]
Title: NISQA: A Deep CNN-Self-Attention Model for Multidimensional Speech Quality Prediction with Crowdsourced Datasets
Comments: Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD)
[209]  arXiv:2104.09615 (cross-list from eess.AS) [pdf]
Title: Robust parameter design for Wiener-based binaural noise reduction methods in hearing aids
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[210]  arXiv:2104.10001 (cross-list from eess.AS) [pdf, ps, other]
Title: Comparison of remote experiments using crowdsourcing and laboratory experiments on speech intelligibility
Comments: This paper was submitted to Interspeech2021
Journal-ref: Proc. Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[211]  arXiv:2104.10217 (cross-list from eess.AS) [pdf, other]
Title: Bias-Aware Loss for Training Image and Speech Quality Prediction Models from Multiple Datasets
Comments: Accepted at QoMEX 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Image and Video Processing (eess.IV)
[212]  arXiv:2104.10328 (cross-list from eess.AS) [pdf, ps, other]
Title: Label-Synchronous Speech-to-Text Alignment for ASR Using Forward and Backward Transformers
Comments: Submitted to INTERSPEECH 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[213]  arXiv:2104.10757 (cross-list from eess.AS) [pdf, other]
Title: Scene-aware Far-field Automatic Speech Recognition
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[214]  arXiv:2104.10764 (cross-list from eess.AS) [pdf, other]
Title: HMM-Free Encoder Pre-Training for Streaming RNN Transducer
Comments: Accepted by Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[215]  arXiv:2104.10832 (cross-list from eess.AS) [pdf, other]
Title: Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss
Comments: Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[216]  arXiv:2104.11033 (cross-list from eess.AS) [pdf, other]
Title: Nonlinear Spatial Filtering in Multichannel Speech Enhancement
Comments: Accepted version, 11 pages, 6 figures
Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Vol. 29, 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[217]  arXiv:2104.11038 (cross-list from eess.AS) [pdf, other]
Title: Voice Privacy with Smart Digital Assistants in Educational Settings
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Cryptography and Security (cs.CR); Sound (cs.SD)
[218]  arXiv:2104.12870 (cross-list from eess.AS) [pdf, other]
Title: Multi-Task Learning for End-to-End ASR Word and Utterance Confidence with Deletion Prediction
Comments: Submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[219]  arXiv:2104.13069 (cross-list from eess.AS) [pdf, other]
Title: Visualization of Linear Operations in the Spherical Harmonics Domain
Comments: Pre-print/author version of paper presented at International Conference on Immersive and 3D Audio (I3DA), Sept. 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[220]  arXiv:2104.13168 (cross-list from eess.AS) [pdf, other]
Title: dEchorate: a Calibrated Room Impulse Response Database for Echo-aware Signal Processing
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[221]  arXiv:2104.13247 (cross-list from eess.AS) [pdf, other]
Title: IATos: AI-powered pre-screening tool for COVID-19 from cough audio samples
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[222]  arXiv:2104.13347 (cross-list from eess.AS) [pdf, other]
Title: BeamLearning: an end-to-end Deep Learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data
Comments: The following article has been submitted to the special issue on Machine Learning in Acoustics in JASA. After it is published, it will be found at this http URL
Journal-ref: J. Acoust. Soc. Am. 149 (6), June 2021
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[223]  arXiv:2104.13423 (cross-list from eess.AS) [pdf]
Title: DASEE A Synthetic Database of Domestic Acoustic Scenes and Events in Dementia Patients Environment
Comments: 5 pages, 4 figures, 6 tables
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD); Signal Processing (eess.SP)
[224]  arXiv:2104.13553 (cross-list from eess.AS) [pdf, other]
Title: AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries
Comments: 10 pages, 8 figures, 3 tables, under reviewing of ACMMM 21
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[225]  arXiv:2104.13620 (cross-list from eess.AS) [pdf, other]
Title: IDMT-Traffic: An Open Benchmark Dataset for Acoustic Traffic Monitoring Research
Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Sound (cs.SD)
[226]  arXiv:2104.13970 (cross-list from eess.AS) [pdf, other]
Title: Personalized Keyphrase Detection using Speaker and Environment Information
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[227]  arXiv:2104.14264 (cross-list from eess.AS) [pdf]
Title: Hardware-Friendly Synaptic Orders and Timescales in Liquid State Machines for Speech Classification
Subjects: Audio and Speech Processing (eess.AS); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Neurons and Cognition (q-bio.NC)
[228]  arXiv:2104.14791 (cross-list from eess.AS) [pdf, other]
Title: Deformable TDNN with adaptive receptive fields for speech recognition
Comments: 5 pages. submitted to Interspeech 2021
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[229]  arXiv:2104.14921 (cross-list from eess.AS) [pdf, ps, other]
Title: Crackle Detection In Lung Sounds Using Transfer Learning And Multi-Input Convolitional Neural Networks
Comments: Under Review in Proceeding of EMBC 2021
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[ total of 229 entries: 1-229 ]
[ showing 229 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2209, contact, help  (Access key information)