We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Nov 2021, skipping first 25

[ total of 197 entries: 1-50 | 26-75 | 76-125 | 126-175 | 176-197 ]
[ showing 50 entries per page: fewer | more | all ]
[26]  arXiv:2111.04436 [pdf, other]
Title: SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27]  arXiv:2111.04988 [pdf, other]
Title: Ultra-Low Power Keyword Spotting at the Edge
Comments: 5 pages, 5 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[28]  arXiv:2111.05095 [pdf, other]
Title: Speaker Generation
Comments: 12 pages, 3 figures, 4 tables, appendix with 2 tables
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[29]  arXiv:2111.05174 [pdf, other]
Title: CAESynth: Real-Time Timbre Interpolation and Pitch Control with Conditional Autoencoders
Comments: MLSP 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30]  arXiv:2111.05501 [pdf, other]
Title: Inclusive Speaker Verification with Adaptive thresholding
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[31]  arXiv:2111.05592 [pdf, other]
Title: Improving the Chamberlin Digital State Variable Filter
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[32]  arXiv:2111.05846 [pdf, other]
Title: Structure from Silence: Learning Scene Structure from Ambient Sound
Comments: Accepted to CoRL 2021 (Oral Presentation)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[33]  arXiv:2111.05895 [pdf, other]
Title: A Generic Deep Learning Based Cough Analysis System from Clinically Validated Samples for Point-of-Need Covid-19 Test and Severity Levels
Journal-ref: IEEE Transactions on Services Computing (2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[34]  arXiv:2111.06046 [pdf, other]
Title: Music Score Expansion with Variable-Length Infilling
Comments: Going to published as a late-breaking demo paper at ISMIR 2021
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[35]  arXiv:2111.06316 [pdf, other]
Title: Unsupervised Noise Adaptive Speech Enhancement by Discriminator-Constrained Optimal Transport
Comments: Accepted at NeurIPS 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[36]  arXiv:2111.06331 [pdf, other]
Title: Towards an Efficient Voice Identification Using Wav2Vec2.0 and HuBERT Based on the Quran Reciters Dataset
Comments: 5 pages, 9 figures, 2 tables
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[37]  arXiv:2111.06531 [pdf, other]
Title: Domain Generalization on Efficient Acoustic Scene Classification using Residual Normalization
Comments: Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38]  arXiv:2111.06625 [pdf]
Title: A Convolutional Neural Network Based Approach to Recognize Bangla Spoken Digits from Speech Signal
Comments: 4 pages, 5 figures, 2021 International Conference on Electronics, Communications and Information Technology (ICECIT), 14 to 16 September 2021, Khulna, Bangladesh
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
[39]  arXiv:2111.06643 [pdf, other]
Title: Fully Automatic Page Turning on Real Scores
Comments: ISMIR 2021 Late Breaking/Demo
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[40]  arXiv:2111.07094 [pdf]
Title: Speech Emotion Recognition Using Deep Sparse Auto-Encoder Extreme Learning Machine with a New Weighting Scheme and Spectro-Temporal Features Along with Classical Feature Selection and A New Quantum-Inspired Dimension Reduction Method
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[41]  arXiv:2111.07116 [pdf, other]
Title: Direct Noisy Speech Modeling for Noisy-to-Noisy Voice Conversion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42]  arXiv:2111.07234 [pdf]
Title: Speech Emotion Recognition System by Quaternion Nonlinear Echo State Network
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43]  arXiv:2111.07518 [pdf, other]
Title: Time-Frequency Attention for Monaural Speech Enhancement
Comments: 5 pages, 4 figures, Accepted and presented at ICASSP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44]  arXiv:2111.07657 [pdf]
Title: Symbolic Music Loop Generation with VQ-VAE
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[45]  arXiv:2111.07979 [pdf, other]
Title: Metric-based multimodal meta-learning for human movement identification via footstep recognition
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Systems and Control (eess.SY); Neurons and Cognition (q-bio.NC)
[46]  arXiv:2111.08196 [pdf, other]
Title: An Exploratory Study on Perceptual Spaces of the Singing Voice
Comments: In Proceedings of the 2020 Joint Conference on AI Music Creativity (CSMC-MuMe 2020), Stockholm, Sweden, October 15-19, 2020
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47]  arXiv:2111.08327 [pdf, other]
Title: Detecting acoustic reflectors using a robot's ego-noise
Authors: Usama Saqib (AAU), Antoine Deleforge (MULTISPEECH), Jesper Jensen (AAU)
Journal-ref: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Jun 2021, Toronto, Canada
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48]  arXiv:2111.08462 [pdf, other]
Title: Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations
Comments: Accepted to "Deep Generative Models and Downstream Applications" (Oral) and "Machine Learning for Creativity and Design" (Poster) workshops at NeurIPS 2021
Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[49]  arXiv:2111.08839 [pdf, other]
Title: Zero-shot Singing Technique Conversion
Comments: In Proceedings of the 15th International Symposium on Computer Music Multidisciplinary Research (CMMR 2021), Tokyo, Japan, November 15-16, 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2111.08910 [pdf, other]
Title: Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[51]  arXiv:2111.09014 [pdf]
Title: Subject Enveloped Deep Sample Fuzzy Ensemble Learning Algorithm of Parkinson's Speech Data
Comments: 18 pages, 4 figures
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[52]  arXiv:2111.09052 [pdf, other]
Title: High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency
Comments: Proceedings of INTERSPEECH 2020
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[53]  arXiv:2111.09075 [pdf, ps, other]
Title: Cross-lingual Low Resource Speaker Adaptation Using Phonological Features
Comments: Proceedings of INTERSPEECH 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[54]  arXiv:2111.09146 [pdf, other]
Title: Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control
Comments: Proceedings of 11th ISCA Speech Synthesis Workshop (SSW 11)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[55]  arXiv:2111.09642 [pdf, other]
Title: Towards Intelligibility-Oriented Audio-Visual Speech Enhancement
Comments: 6 pages, 4 figures
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[56]  arXiv:2111.09931 [pdf, other]
Title: DawDreamer: Bridging the Gap Between Digital Audio Workstations and Python Interfaces
Authors: David Braun
Comments: 3 pages with 0 figures. Included in the Late-Breaking Demo Session of the 22nd International Society for Music Information Retrieval Conference
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[57]  arXiv:2111.10003 [pdf, other]
Title: Differentiable Wavetable Synthesis
Comments: Accepted by ICASSP 2022, Demo: this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[58]  arXiv:2111.10168 [pdf, other]
Title: Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control
Comments: Proceedings of SPECOM 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[59]  arXiv:2111.10173 [pdf, other]
Title: Word-Level Style Control for Expressive, Non-attentive Speech Synthesis
Comments: Proceedings of SPECOM 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[60]  arXiv:2111.10177 [pdf, other]
Title: Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Comments: Proceedings of ICASSP 2021
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[61]  arXiv:2111.10235 [pdf, other]
Title: Interpreting deep urban sound classification using Layer-wise Relevance Propagation
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[62]  arXiv:2111.10592 [pdf, other]
Title: Deep Spoken Keyword Spotting: An Overview
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[63]  arXiv:2111.10639 [pdf, other]
Title: Implicit Acoustic Echo Cancellation for Keyword Spotting and Device-Directed Speech Detection
Comments: Submitted to INTERSPEECH 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[64]  arXiv:2111.10783 [pdf]
Title: Automatic Detection of Depression from Stratified Samples of Audio Data
Comments: 30 pages, 6 figures
Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[65]  arXiv:2111.10897 [pdf, other]
Title: Health Monitoring of Industrial machines using Scene-Aware Threshold Selection
Comments: 5 pages, 4 figures, 1 Table
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[66]  arXiv:2111.11023 [pdf, ps, other]
Title: Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[67]  arXiv:2111.11063 [pdf, other]
Title: Comparing the Accuracy of Deep Neural Networks (DNN) and Convolutional Neural Network (CNN) in Music Genre Recognition (MGR): Experiments on Kurdish Music
Comments: 8 pages, 5 figures, 3 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[68]  arXiv:2111.11636 [pdf]
Title: Music Classification: Beyond Supervised Learning, Towards Real-world Applications
Comments: This is a web book written for a tutorial session of the 22nd International Society for Music Information Retrieval Conference, Nov 8-12, 2021. Please visit this https URL for the original, web book format
Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[69]  arXiv:2111.11737 [pdf]
Title: ADTOF: A large dataset of non-synthetic music for automatic drum transcription
Comments: Proceedings of the 22nd International Society for Music Information Retrieval Conference, ISMIR, Online, pp. 818-824
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[70]  arXiv:2111.11755 [pdf, other]
Title: Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance
Comments: 15 pages, 5 figures, ICML'2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[71]  arXiv:2111.11773 [pdf, other]
Title: Upsampling layers for music source separation
Comments: Demo page: this http URL
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[72]  arXiv:2111.11859 [pdf]
Title: Longitudinal Speech Biomarkers for Automated Alzheimer's Detection
Journal-ref: Frontiers in Computer Science, 08 April 2021
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[73]  arXiv:2111.12124 [pdf, ps, other]
Title: Towards Learning Universal Audio Representations
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[74]  arXiv:2111.12324 [pdf, other]
Title: How Speech is Recognized to Be Emotional - A Study Based on Information Decomposition
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[75]  arXiv:2111.12326 [pdf, other]
Title: A Study on Decoupled Probabilistic Linear Discriminant Analysis
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 197 entries: 1-50 | 26-75 | 76-125 | 126-175 | 176-197 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2208, contact, help  (Access key information)