Sound
Authors and titles for cs.SD in Jun 2022
[ total of 221 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-221 ][ showing 50 entries per page: fewer | more | all ]
- [1] arXiv:2206.00208 [pdf, other]
-
Title: AdaVITS: Tiny VITS for Low Computing Resource Speaker AdaptationAuthors: Kun Song, Heyang Xue, Xinsheng Wang, Jian Cong, Yongmao Zhang, Lei Xie, Bing Yang, Xiong Zhang, Dan SuComments: Accepted by ISCSLP 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [2] arXiv:2206.00393 [pdf, other]
-
Title: Towards Generalisable Audio Representations for Audio-Visual NavigationComments: CVPR 2022 Embodied AI WorkshopSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Audio and Speech Processing (eess.AS)
- [3] arXiv:2206.00454 [pdf, other]
-
Title: Towards Context-Aware Neural Performance-Score SynchronisationAuthors: Ruchit AgrawalComments: PhD Thesis, Queen Mary University of London (190 pages)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- [4] arXiv:2206.00635 [pdf, other]
-
Title: Speech Artifact Removal from EEG Recordings of Spoken Word Production with Tensor DecompositionJournal-ref: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [5] arXiv:2206.00901 [pdf, ps, other]
-
Title: Musical Instrument Recognition by XGBoost Combining Feature FusionSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [6] arXiv:2206.01071 [pdf, other]
-
Title: Partitura: A Python Package for Symbolic Music ProcessingAuthors: Carlos Cancino-Chacón, Silvan David Peter, Emmanouil Karystinaios, Francesco Foscarin, Maarten Grachten, Gerhard WidmerJournal-ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, CanadaSubjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
- [7] arXiv:2206.01104 [pdf, other]
-
Title: The match file format: Encoding Alignments between Scores and PerformancesAuthors: Francesco Foscarin, Emmanouil Karystinaios, Silvan David Peter, Carlos Cancino-Chacón, Maarten Grachten, Gerhard WidmerJournal-ref: Proceedings of the Music Encoding Conference (MEC), 2022, Halifax, CanadaSubjects: Sound (cs.SD); Digital Libraries (cs.DL); Audio and Speech Processing (eess.AS)
- [8] arXiv:2206.01305 [pdf, other]
-
Title: The Musical Arrow of Time -- The Role of Temporal Asymmetry in Music and Its Organicist ImplicationsAuthors: Qi XuSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [9] arXiv:2206.01542 [src]
-
Title: Detecting the Severity of Major Depressive Disorder from Speech: A Novel HARD-Training MethodologyAuthors: Edward L. Campbell, Judith Dineley, Pauline Conde, Faith Matcham, Femke Lamers, Sara Siddi, Laura Docio-Fernandez, Carmen Garcia-Mateo, Nicholas Cummins, the RADAR-CNS ConsortiumComments: Error in Training CodeSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Quantitative Methods (q-bio.QM)
- [10] arXiv:2206.02211 [pdf, other]
-
Title: Variable-rate hierarchical CPC leads to acoustic unit discovery in speechComments: Accepted to 36th Conference on Neural Information Processing Systems (NeurIPS 2022)Journal-ref: Advances in Neural Information Processing Systems, 2022Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Audio and Speech Processing (eess.AS)
- [11] arXiv:2206.02246 [pdf, other]
-
Title: Zero-Shot Voice Conditioning for Denoising Diffusion TTS ModelsComments: Accepted to Interspeech 2022Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
- [12] arXiv:2206.02284 [pdf, other]
-
Title: Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous TranslatorAuthors: Xiaofeng Liu, Fangxu Xing, Jerry L. Prince, Jiachen Zhuo, Maureen Stone, Georges El Fakhri, Jonghye WooComments: MICCAI 2022 (early accept, Oral Presentation ~3%)Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [13] arXiv:2206.02671 [pdf, ps, other]
-
Title: Canonical Cortical Graph Neural Networks and its Application for Speech Enhancement in Audio-Visual Hearing AidsSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
- [14] arXiv:2206.03065 [pdf, other]
-
Title: Universal Speech Enhancement with Score-based DiffusionComments: 24 pages, 6 figures; includes appendix; examples in this https URLSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [15] arXiv:2206.03351 [pdf, other]
-
Title: AS2T: Arbitrary Source-To-Target Adversarial Attack on Speaker Recognition SystemsSubjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [16] arXiv:2206.03393 [pdf, other]
-
Title: Towards Understanding and Mitigating Audio Adversarial Examples for Speaker RecognitionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [17] arXiv:2206.04006 [pdf, other]
-
Title: Few-Shot Audio-Visual Learning of Environment AcousticsComments: Accepted to NeurIPS 2022Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [18] arXiv:2206.04658 [pdf, other]
-
Title: BigVGAN: A Universal Neural Vocoder with Large-Scale TrainingComments: To appear at ICLR 2023. Listen to audio samples from BigVGAN at: this https URLSubjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [19] arXiv:2206.04769 [pdf, other]
-
Title: CLAP: Learning Audio Concepts From Natural Language SupervisionSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [20] arXiv:2206.04780 [pdf, other]
-
Title: Speak Like a Dog: Human to Non-human creature Voice ConversionComments: 5 pages, 4 figuresJournal-ref: 2022 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) (pp. 1388-1393)Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [21] arXiv:2206.04805 [pdf, other]
-
Title: Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022Comments: Submitted to CEUR-WS under LifeCLEF for the BirdCLEF 2022 challenge as a working noteSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [22] arXiv:2206.04962 [pdf, other]
-
Title: Feature Learning and Ensemble Pre-Tasks Based Self-Supervised Speech Denoising and DereverberationComments: arXiv admin note: text overlap with arXiv:2112.11142Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [23] arXiv:2206.04984 [pdf, other]
-
Title: Zero-Shot Audio Classification using Image EmbeddingsComments: Accepted to the European Signal Processing Conference (EUSIPCO) 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [24] arXiv:2206.05018 [pdf, ps, other]
-
Title: Going Beyond the Cookie Theft Picture Test: Detecting Cognitive Impairments using Acoustic FeaturesAuthors: Franziska Braun, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Korbinian Riedhammer, Sebastian P. BayerlComments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [25] arXiv:2206.05286 [src]
-
Title: AHD ConvNet for Speech Emotion ClassificationComments: Wrong authors quotedSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [26] arXiv:2206.05408 [pdf, other]
-
Title: Multi-instrument Music Synthesis with Spectrogram DiffusionAuthors: Curtis Hawthorne, Ian Simon, Adam Roberts, Neil Zeghidour, Josh Gardner, Ethan Manilow, Jesse EngelSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [27] arXiv:2206.05876 [pdf, other]
-
Title: Description and Discussion on DCASE 2022 Challenge Task 2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring Applying Domain Generalization TechniquesAuthors: Kota Dohi, Keisuke Imoto, Noboru Harada, Daisuke Niizumi, Yuma Koizumi, Tomoya Nishida, Harsh Purohit, Takashi Endo, Masaaki Yamamoto, Yohei KawaguchiComments: arXiv admin note: substantial text overlap with arXiv:2106.04492Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
- [28] arXiv:2206.05929 [pdf, other]
-
Title: Improvement of Serial Approach to Anomalous Sound Detection by Incorporating Two Binary Cross-Entropies for Outlier ExposureComments: 5 pages, 3 figures, 3 tables, EUSIPCO 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [29] arXiv:2206.06057 [pdf, ps, other]
-
Title: Low-complexity deep learning frameworks for acoustic scene classificationSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [30] arXiv:2206.06117 [pdf, ps, other]
-
Title: Optimizing musical chord inversions using the cartesian coordinate systemAuthors: Steve Mathew D AComments: 9 pages, 5 tablesSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [31] arXiv:2206.06126 [pdf, other]
-
Title: Robust Time Series Denoising with Learnable Wavelet Packet TransformComments: 15 pages, 13 figures, 8 tablesSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [32] arXiv:2206.06573 [pdf, ps, other]
-
Title: Speech intelligibility of simulated hearing loss sounds and its prediction using the Gammachirp Envelope Similarity Index (GESI)Comments: This preprint is a copy of the final version accepted for Interspeech 2022. See this https URLJournal-ref: Proc. Interspeech 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [33] arXiv:2206.06604 [pdf, other]
-
Title: WHIS: Hearing impairment simulator based on the gammachirp auditory filterbankAuthors: Toshio IrinoComments: This preprint was an original version that was unsuccessfully submitted to Trends in Hearing on June 5, 2022. The revised version has been accepted for publication in IEEE access. See this https URL ( this https URL )Journal-ref: IEEE access, 25 July 2023Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [34] arXiv:2206.06680 [pdf, other]
-
Title: Exploring speaker enrolment for few-shot personalisation in emotional vocalisation predictionComments: Proceedings of the ICML Expressive Vocalizations Workshop and Competition held in conjunction with the $\mathit{39}^{th}$ International Conference on Machine Learning, Copyright 2022 by the author(s)Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [35] arXiv:2206.06908 [pdf, other]
-
Title: LPCSE: Neural Speech Enhancement through Linear Predictive CodingSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [36] arXiv:2206.07176 [pdf, other]
-
Title: Frequency-centroid features for word recognition of non-native English speakersComments: Published in IEEE Irish Signals & Systems Conference (ISSC), 2022Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [37] arXiv:2206.07229 [pdf, other]
-
Title: Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep LearningComments: To appear in INTERSPEECH 2022. 5 pages, 4 figures. Substantial text overlap with arXiv:2110.03156Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [38] arXiv:2206.07288 [pdf, other]
-
Title: Streaming non-autoregressive model for any-to-many voice conversionSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [39] arXiv:2206.07289 [pdf, other]
-
Title: Text-Aware End-to-end Mispronunciation Detection and DiagnosisComments: Rejected by Interspeech2022Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [40] arXiv:2206.07293 [pdf, other]
-
Title: FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech EnhancementComments: The paper has been accepted by ICASSP 2022. 5 pages, 2 figures, 5 tablesSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [41] arXiv:2206.07340 [pdf, other]
- [42] arXiv:2206.07347 [pdf, other]
-
Title: On the Use of Deep Mask Estimation Module for Neural Source Separation SystemsComments: Accepted by Interspeech 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [43] arXiv:2206.07511 [pdf, other]
-
Title: Investigating Multi-Feature Selection and Ensembling for Audio ClassificationSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [44] arXiv:2206.07860 [pdf, other]
-
Title: EPG2S: Speech Generation and Speech Enhancement based on Electropalatography and Audio Signals using Multimodal LearningComments: Accepted By IEEE Signal Processing LetterJournal-ref: IEEE Signal Processing Letters, vol. 29, p. 2582-2586, 2022Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [45] arXiv:2206.07956 [pdf, other]
-
Title: Automatic Prosody Annotation with Pre-Trained Text-Speech ModelComments: accepted by INTERSPEECH2022Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [46] arXiv:2206.08007 [pdf, ps, other]
-
Title: DCASE 2022: Comparative Analysis Of CNNs For Acoustic Scene Classification Under Low-Complexity ConsiderationsSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [47] arXiv:2206.08039 [pdf, ps, other]
-
Title: Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue HistoryComments: 5 pages, 3 figures, Accepted for INTERSPEECH2022Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [48] arXiv:2206.08170 [pdf, other]
-
Title: Adversarial Privacy Protection on Speech EnhancementComments: 5 pages, 6 figuresSubjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [49] arXiv:2206.08189 [pdf, other]
-
Title: Censer: Curriculum Semi-supervised Learning for Speech Recognition Based on Self-supervised Pre-trainingSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [50] arXiv:2206.08233 [pdf, other]
-
Title: Event-related data conditioning for acoustic event classificationComments: Accepted by INTERSPEECH 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ showing 50 entries per page: fewer | more | all ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, 2403, contact, help (Access key information)