Sound
Authors and titles for cs.SD in Feb 2023
[ total of 179 entries: 1-50 | 51-100 | 101-150 | 151-179 ][ showing 50 entries per page: fewer | more | all ]
- [1] arXiv:2302.00286 [pdf, other]
-
Title: Jointist: Simultaneous Improvement of Multi-instrument Transcription and Music Source Separation via Joint TrainingAuthors: Kin Wai Cheuk, Keunwoo Choi, Qiuqiang Kong, Bochen Li, Minz Won, Ju-Chiang Wang, Yun-Ning Hung, Dorien HerremansComments: arXiv admin note: text overlap with arXiv:2206.10805Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [2] arXiv:2302.00646 [pdf, other]
-
Title: Epic-Sounds: A Large-scale Dataset of Actions That SoundComments: 6 pages, 4 figuresSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [3] arXiv:2302.00868 [pdf, other]
-
Title: Speech Enhancement for Virtual Meetings on Cellular NetworksSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [4] arXiv:2302.01090 [pdf, other]
-
Title: Goniometers are a Powerful Acoustic Feature for Music Information Retrieval TasksAuthors: Tim ZiemerSubjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
- [5] arXiv:2302.02257 [pdf, other]
-
Title: Multi-Source Diffusion Models for Simultaneous Music Generation and SeparationAuthors: Giorgio Mariani, Irene Tallini, Emilian Postolache, Michele Mancusi, Luca Cosmo, Emanuele RodolàComments: ICLR 2024 oral presentation. Demo page: this https URLSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [6] arXiv:2302.02845 [pdf, other]
-
Title: Audio Representation Learning by Distilling Video as Privileged InformationSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [7] arXiv:2302.02945 [pdf, ps, other]
-
Title: Improved Vehicle Sub-type Classification for Acoustic Traffic MonitoringComments: Accepted at Twenty-Ninth National Conference on Communications(NCC) 23 - 26 February, Indian Institute of Technology GuwahatiSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [8] arXiv:2302.03540 [pdf, other]
-
Title: Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal SupervisionAuthors: Eugene Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Sertan Girgin, Olivier Pietquin, Matt Sharifi, Marco Tagliasacchi, Neil ZeghidourSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [9] arXiv:2302.03917 [pdf, other]
-
Title: Noise2Music: Text-conditioned Music Generation with Diffusion ModelsAuthors: Qingqing Huang, Daniel S. Park, Tao Wang, Timo I. Denk, Andy Ly, Nanxin Chen, Zhengdong Zhang, Zhishuai Zhang, Jiahui Yu, Christian Frank, Jesse Engel, Quoc V. Le, William Chan, Zhifeng Chen, Wei HanComments: 15 pagesSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [10] arXiv:2302.04456 [pdf, other]
-
Title: ERNIE-Music: Text-to-Waveform Music Generation with Diffusion ModelsComments: Accepted by AACL demo 2023Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [11] arXiv:2302.04469 [pdf, other]
-
Title: Joint Acoustic Echo Cancellation and Speech Dereverberation Using Kalman filtersSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [12] arXiv:2302.04577 [pdf, other]
-
Title: Incorporating Total Variation Regularization in the design of an intelligent Query by Humming systemSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [13] arXiv:2302.05393 [pdf, other]
-
Title: GTR-CTRL: Instrument and Genre Conditioning for Guitar-Focused Music Generation with TransformersComments: This preprint is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). The Version of Record of this contribution is published in Proceedings of EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023Journal-ref: EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [14] arXiv:2302.05690 [pdf, ps, other]
-
Title: Attention does not guarantee best performance in speech enhancementSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [15] arXiv:2302.05693 [pdf, ps, other]
-
Title: Local spectral attention for full-band speech enhancementSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [16] arXiv:2302.05725 [pdf, other]
-
Title: Parameterizable Acoustical Modeling and Auralization of Cultural Heritage Sites based on PhotogrammetryAuthors: Dominik UkolovComments: 6 pages, 3 figures, 27th Conference on Cultural Heritage and New Technologies (Vienna, 2022)Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [17] arXiv:2302.05940 [pdf, other]
-
Title: SemanticAC: Semantics-Assisted Framework for Audio ClassificationSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [18] arXiv:2302.07640 [pdf, other]
-
Title: Detection and classification of vocal productions in large scale audio recordingsAuthors: Guillem Bonafos, Pierre Pudlo, Jean-Marc Freyermuth, Thierry Legou, Joël Fagot, Samuel Tronçon, Arnaud ReySubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Applications (stat.AP)
- [19] arXiv:2302.08095 [pdf, other]
-
Title: PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech EnhancementAuthors: Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha RajComments: Accepted at ICASSP 2023Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [20] arXiv:2302.08130 [pdf, other]
-
Title: Personalized Audio Quality Preference PredictionSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [21] arXiv:2302.08136 [pdf, ps, other]
-
Title: An Attention-based Approach to Hierarchical Multi-label Music Instrument ClassificationAuthors: Zhi Zhong, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Shusuke Takahashi, Yuki MitsufujiComments: To appear at ICASSP 2023Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [22] arXiv:2302.08137 [pdf, other]
-
Title: ACE-VC: Adaptive and Controllable Voice Conversion using Explicitly Disentangled Self-supervised Speech RepresentationsComments: Published as a conference paper at ICASSP 2023Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [23] arXiv:2302.08296 [pdf, other]
-
Title: QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster ConversionSubjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [24] arXiv:2302.08632 [pdf, other]
-
Title: jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning ResearchAuthors: Tosiron AdegbijaComments: To Appear at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2023Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [25] arXiv:2302.08650 [pdf, other]
-
Title: Gaussian-smoothed Imbalance Data Improves Speech Emotion RecognitionComments: 5 pagesSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [26] arXiv:2302.08841 [pdf, other]
-
Title: Lip-to-Speech Synthesis in the Wild with Multi-task LearningComments: Accepted at ICASSP 2023. Demo available: this https URLSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [27] arXiv:2302.08921 [pdf, other]
-
Title: Deep Implicit Distribution Alignment Networks for Cross-Corpus Speech Emotion RecognitionSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [28] arXiv:2302.09198 [pdf, other]
-
Title: Exposing AI-Synthesized Human Voices Using Neural Vocoder ArtifactsComments: Dataset and codes will be available at this https URLSubjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [29] arXiv:2302.09214 [pdf, other]
-
Title: Cost-effective Models for Detecting Depression from SpeechComments: Accepted to ICMLA 2022Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
- [30] arXiv:2302.09908 [pdf, other]
-
Title: A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker OneComments: Accepted by IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [31] arXiv:2302.09991 [pdf, ps, other]
-
Title: Towards Measuring and Scoring Speaker Diarization FairnessSubjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [32] arXiv:2302.10248 [pdf, ps, other]
-
Title: VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition ChallengeAuthors: Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew ZissermanSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [33] arXiv:2302.10340 [pdf, ps, other]
-
Title: pykanto: a python library to accelerate research on wild bird songAuthors: Nilo Merino RecaldeComments: 9 pages, 3 figuresJournal-ref: Methods in Ecology and Evolution, 00, 1-9Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Populations and Evolution (q-bio.PE); Quantitative Methods (q-bio.QM)
- [34] arXiv:2302.10536 [pdf, other]
-
Title: Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain PairingComments: Demo Samples at this https URLSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [35] arXiv:2302.10657 [pdf, other]
-
Title: DasFormer: Deep Alternating Spectrogram Transformer for Multi/Single-Channel Speech SeparationComments: 5 pages, accepted by ICASSP2023Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
- [36] arXiv:2302.10686 [pdf, other]
-
Title: Interpretable Spectrum Transformation Attacks to Speaker RecognitionSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [37] arXiv:2302.10924 [pdf, other]
-
Title: A Reinforcement Learning Framework for Online Speaker DiarizationSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [38] arXiv:2302.10983 [pdf, other]
-
Title: Do Orcas Have Semantic Language? Machine Learning to Predict Orca Behaviors Using Partially Labeled Vocalization DataAuthors: Sophia SandholmSubjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [39] arXiv:2302.11192 [pdf, other]
-
Title: Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data AugmentationComments: Accepted by ICASSP 2023Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [40] arXiv:2302.11254 [pdf, other]
-
Title: Cross-modal Audio-visual Co-learning for Text-independent Speaker VerificationSubjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
- [41] arXiv:2302.11343 [pdf, other]
-
Title: Advancing Stuttering Detection via Data Augmentation, Class-Balanced Loss and Multi-Contextual Deep LearningComments: Accepted in IEEE Journal of Biomedical Health Informatics 2023Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [42] arXiv:2302.11558 [pdf, other]
-
Title: Improving Speech Enhancement via Event-based QueryComments: Accepted by ICASSP2023Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [43] arXiv:2302.11824 [pdf, other]
-
Title: MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-AttentionsComments: 5 pages, 3 figures, accepted by ICASSP 2023Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [44] arXiv:2302.11832 [pdf, other]
-
Title: D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech EnhancementComments: 5 pages, 3 figures, accepted by ICASSP 2023Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [45] arXiv:2302.11981 [pdf, other]
-
Title: Unsupervised Noise adaptation using Data SimulationComments: Accepted by ICASSP2023Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [46] arXiv:2302.11989 [pdf, other]
-
Title: Metric-oriented Speech Enhancement using Diffusion Probabilistic ModelComments: Accepted by ICASSP2023Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
- [47] arXiv:2302.12258 [pdf, other]
-
Title: Data leakage in cross-modal retrieval training: A case studyComments: 5 pages. Accepted at ICASSP2023Subjects: Sound (cs.SD); Computation and Language (cs.CL); Information Retrieval (cs.IR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [48] arXiv:2302.12434 [pdf, other]
-
Title: Catch You and I Can: Revealing Source Voiceprint Against Voice ConversionAuthors: Jiangyi Deng (1), Yanjiao Chen (1), Yinan Zhong (1), Qianhao Miao (1), Xueluan Gong (2), Wenyuan Xu (1) ((1) Zhejiang University, (2) Wuhan University)Comments: Accepted by USENIX Security Symposium 2023. Please cite this paper as "Jiangyi Deng, Yanjiao Chen, Yinan Zhong, Qianhao Miao, Xueluan Gong, Wenyuan Xu. Catch You and I Can: Revealing Source Voiceprint Against Voice Conversion. In 32nd USENIX Security Symposium (USENIX Security 23)."Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
- [49] arXiv:2302.12716 [pdf, other]
-
Title: Supervised Hierarchical Clustering using Graph Neural Networks for Speaker DiarizationComments: 5 pages including references. Accepted in ICASSP 2023Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [50] arXiv:2302.12773 [pdf, other]
-
Title: Towards multi-task learning of speech and speaker recognitionComments: accepted to interspeech 2023Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[ showing 50 entries per page: fewer | more | all ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, cs, 2403, contact, help (Access key information)