We gratefully acknowledge support from
the Simons Foundation and member institutions.

Sound

Authors and titles for cs.SD in Jun 2022

[ total of 221 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-221 ]
[ showing 50 entries per page: fewer | more | all ]
[1]  arXiv:2206.00208 [pdf, other]
Title: AdaVITS: Tiny VITS for Low Computing Resource Speaker Adaptation
Comments: Accepted by ISCSLP 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2]  arXiv:2206.00393 [pdf, other]
Title: Towards Generalisable Audio Representations for Audio-Visual Navigation
Comments: CVPR 2022 Embodied AI Workshop
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
[3]  arXiv:2206.00454 [pdf, other]
Title: Towards Context-Aware Neural Performance-Score Synchronisation
Authors: Ruchit Agrawal
Comments: PhD Thesis, Queen Mary University of London (190 pages)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
[4]  arXiv:2206.00635 [pdf, other]
Title: Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor Decomposition
Journal-ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[5]  arXiv:2206.00901 [pdf]
Title: Musical Instrument Recognition by XGBoost Combining Feature Fusion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[6]  arXiv:2206.01071 [pdf, other]
Title: Partitura: A Python Package for Symbolic Music Processing
Journal-ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[7]  arXiv:2206.01104 [pdf, other]
Title: The match file format: Encoding Alignments between Scores and Performances
Journal-ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, Canada
Subjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
[8]  arXiv:2206.01305 [pdf, other]
Title: The Musical Arrow of Time -- The Role of Temporal Asymmetry in Music and Its Organicist Implications
Authors: Qi Xu
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[9]  arXiv:2206.01542 [pdf, other]
Title: Detecting the Severity of Major Depressive Disorder from Speech: A Novel HARD-Training Methodology
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
[10]  arXiv:2206.02211 [pdf, other]
Title: Variable-rate hierarchical CPC leads to acoustic unit discovery in speech
Comments: Submitted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022)
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
[11]  arXiv:2206.02246 [pdf, other]
Title: Zero-Shot Voice Conditioning for Denoising Diffusion TTS Models
Comments: Accepted to Interspeech 2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[12]  arXiv:2206.02284 [pdf, other]
Title: Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Comments: MICCAI 2022 (early accept, Oral Presentation ~3%)
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[13]  arXiv:2206.02671 [pdf, ps, other]
Title: Canonical Cortical Graph Neural Networks and its Application for Speech Enhancement in Future Audio-Visual Hearing Aids
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[14]  arXiv:2206.03065 [pdf, other]
Title: Universal Speech Enhancement with Score-based Diffusion
Comments: 24 pages, 6 figures; includes appendix; examples in this https URL
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15]  arXiv:2206.03351 [pdf, other]
Title: AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition Systems
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[16]  arXiv:2206.03393 [pdf, other]
Title: Towards Understanding and Mitigating Audio Adversarial Examples for Speaker Recognition
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[17]  arXiv:2206.04006 [pdf, other]
Title: Few-Shot Audio-Visual Learning of Environment Acoustics
Comments: Accepted to NeurIPS 2022
Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18]  arXiv:2206.04658 [pdf, other]
Title: BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Comments: Listen to audio samples from BigVGAN at: this https URL
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[19]  arXiv:2206.04769 [pdf, other]
Title: CLAP: Learning Audio Concepts From Natural Language Supervision
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20]  arXiv:2206.04780 [pdf, other]
Title: Speak Like a Dog: Human to Non-human creature Voice Conversion
Comments: 5 pages, 4 figures
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21]  arXiv:2206.04805 [pdf, other]
Title: Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022
Comments: Submitted to CEUR-WS under LifeCLEF for the BirdCLEF 2022 challenge as a working note
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22]  arXiv:2206.04962 [pdf, other]
Title: Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and Dereverberation
Comments: arXiv admin note: text overlap with arXiv:2112.11142
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23]  arXiv:2206.04984 [pdf, other]
Title: Zero-Shot Audio Classification using Image Embeddings
Comments: Accepted to the European Signal Processing Conference (EUSIPCO) 2022
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24]  arXiv:2206.05018 [pdf, ps, other]
Title: Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic Features
Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[25]  arXiv:2206.05286 [src]
Title: AHD ConvNet for Speech Emotion Classification
Comments: Wrong authors quoted
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[26]  arXiv:2206.05408 [pdf, other]
Title: Multi-instrument Music Synthesis with Spectrogram Diffusion
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27]  arXiv:2206.05876 [pdf, other]
Title: Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization Techniques
Comments: arXiv admin note: substantial text overlap with arXiv:2106.04492
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[28]  arXiv:2206.05929 [pdf, other]
Title: Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier Exposure
Comments: 5 pages, 3 figures, 3 tables, EUSIPCO 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[29]  arXiv:2206.06057 [pdf, ps, other]
Title: Low-complexity deep learning frameworks for acoustic scene classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[30]  arXiv:2206.06117 [pdf]
Title: Optimizing musical chord inversions using the cartesian coordinate system
Authors: Steve Mathew D A
Comments: 9 pages, 5 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31]  arXiv:2206.06126 [pdf, other]
Title: Robust Time Series Denoising with Learnable Wavelet Packet Transform
Comments: 15 pages, 13 figures, 8 tables
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[32]  arXiv:2206.06573 [pdf, ps, other]
Title: Speech intelligibility of simulated hearing loss sounds and its prediction using the Gammachirp Envelope Similarity Index (GESI)
Comments: This paper was submitted to Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[33]  arXiv:2206.06604 [pdf, other]
Title: WHIS: Hearing impairment simulator based on the gammachirp auditory filterbank
Authors: Toshio Irino
Comments: This paper was submitted to Trends in Hearing on Jun 5, 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[34]  arXiv:2206.06680 [pdf, other]
Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation prediction
Comments: Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[35]  arXiv:2206.06908 [pdf, other]
Title: LPCSE: Neural Speech Enhancement through Linear Predictive Coding
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36]  arXiv:2206.07176 [pdf, other]
Title: Frequency-centroid features for word recognition of non-native English speakers
Comments: Published in IEEE Irish Signals & Systems Conference (ISSC), 2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[37]  arXiv:2206.07229 [pdf, other]
Title: Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning
Comments: To appear in INTERSPEECH 2022. 5 pages, 4 figures. Substantial text overlap with arXiv:2110.03156
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[38]  arXiv:2206.07288 [pdf, other]
Title: Streaming non-autoregressive model for any-to-many voice conversion
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39]  arXiv:2206.07289 [pdf, other]
Title: Text-Aware End-to-end Mispronunciation Detection and Diagnosis
Comments: Rejected by Interspeech2022
Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[40]  arXiv:2206.07293 [pdf, other]
Title: FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Comments: The paper has been accepted by ICASSP 2022. 5 pages, 2 figures, 5 tables
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41]  arXiv:2206.07340 [pdf, other]
Title: On the Design and Training Strategies for RNN-based Online Neural Speech Separation Systems
Authors: Kai Li, Yi Luo
Comments: 5 pages, 1 figure
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42]  arXiv:2206.07347 [pdf, other]
Title: On the Use of Deep Mask Estimation Module for Neural Source Separation Systems
Authors: Kai Li, Xiaolin Hu, Yi Luo
Comments: Accepted by Interspeech 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[43]  arXiv:2206.07511 [pdf, other]
Title: Investigating Multi-Feature Selection and Ensembling for Audio Classification
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[44]  arXiv:2206.07860 [pdf, other]
Title: EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal Learning
Comments: Accepted By IEEE Signal Processing Letter
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[45]  arXiv:2206.07956 [pdf, other]
Title: Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Comments: accepted by INTERSPEECH2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[46]  arXiv:2206.08007 [pdf, ps, other]
Title: DCASE 2022: Comparative Analysis Of CNNs For Acoustic Scene Classification Under Low-Complexity Considerations
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[47]  arXiv:2206.08039 [pdf, ps, other]
Title: Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History
Comments: 5 pages, 3 figures, Accepted for INTERSPEECH2022
Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[48]  arXiv:2206.08170 [pdf, other]
Title: Adversarial Privacy Protection on Speech Enhancement
Comments: 5 pages, 6 figures
Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[49]  arXiv:2206.08189 [pdf, other]
Title: Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-training
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50]  arXiv:2206.08233 [pdf, other]
Title: Event-related data conditioning for acoustic event classification
Comments: Accepted by INTERSPEECH 2022
Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ total of 221 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-221 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2211, contact, help  (Access key information)