Sound

Authors and titles for cs.SD in Mar 2023

[ total of 232 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 226-232 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:2303.00204 [pdf, other]: Title: PCF: ECAPA-TDNN with Progressive Channel Fusion for Speaker Verification

Authors: Zhenduo Zhao, Zhuo Li, Wenchao Wang, Pengyuan Zhang

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[2] arXiv:2303.00264 [pdf, other]: Title: Distance-based Weight Transfer from Near-field to Far-field Speaker Verification

Authors: Li Zhang, Qing Wang, Hongji Wang, Yue Li, Wei Rao, Yannan Wang, Lei Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[3] arXiv:2303.00332 [pdf, other]: Title: CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Authors: Hui Wang, Siqi Zheng, Yafeng Chen, Luyao Cheng, Qian Chen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:2303.00502 [pdf, other]: Title: On the Audio-visual Synchronization for Lip-to-Speech Synthesis

Authors: Zhe Niu, Brian Mak

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[5] arXiv:2303.00510 [pdf, other]: Title: A Comparison of Speech Data Augmentation Methods Using S3PRL Toolkit

Authors: Mina Huh, Ruchira Ray, Corey Karnei

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6] arXiv:2303.00747 [pdf, other]: Title: WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Authors: Max Bain, Jaesung Huh, Tengda Han, Andrew Zisserman

Comments: Accepted to INTERSPEECH 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[7] arXiv:2303.01125 [pdf, other]: Title: Distilling Multi-Level X-vector Knowledge for Small-footprint Speaker Verification

Authors: Xuechen Liu, Md Sahidullah, Tomi Kinnunen

Comments: Submitted to Data & Knowledge Engineering at Dec. 2023. Copyright may be transferred without notice

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2303.01126 [pdf, other]: Title: Speaker-Aware Anti-Spoofing

Authors: Xuechen Liu, Md Sahidullah, Kong Aik Lee, Tomi Kinnunen

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Audio and Speech Processing (eess.AS)
[9] arXiv:2303.01211 [pdf, other]: Title: Learning From Yourself: A Self-Distillation Method for Fake Speech Detection

Authors: Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen, Dan Zhang, Zhao Lv

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[10] arXiv:2303.01507 [pdf, other]: Title: Defending against Adversarial Audio via Diffusion Model

Authors: Shutong Wu, Jiongxiao Wang, Wei Ping, Weili Nie, Chaowei Xiao

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:2303.01508 [pdf, other]: Title: Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities

Authors: Shijun Wang, Jón Guðnason, Damian Borth

Comments: Accepted by ICASSP2023

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12] arXiv:2303.01639 [pdf, other]: Title: WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions

Authors: Jun Rekimoto

Comments: ACM CHI 2023 paper

Journal-ref: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), April 23--28, 2023

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Audio and Speech Processing (eess.AS)
[13] arXiv:2303.01664 [pdf, other]: Title: Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations

Authors: Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Yu Zhang, Wei Han, Ankur Bapna, Michiel Bacchiani

Comments: Accepted to WASPAA 2023

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14] arXiv:2303.01665 [pdf, other]: Title: LooperGP: A Loopable Sequence Model for Live Coding Performance using GuitarPro Tablature

Authors: Sara Adkins, Pedro Sarmento, Mathieu Barthet

Comments: The Version of Record of this contribution is published in Proceedings of EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023

Journal-ref: EvoMUSART: International Conference on Computational Intelligence in Music, Sound, Art and Design (Part of EvoStar) 2023

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[15] arXiv:2303.01694 [pdf, other]: Title: DWFormer: Dynamic Window transFormer for Speech Emotion Recognition

Authors: Shuaiqi Chen, Xiaofen Xing, Weibin Zhang, Weidong Chen, Xiangmin Xu

Comments: 4 pages, 5 figures, 3 tables, accepted by 2023 International Conference on Acoustics, Speech, and Signal Processing (ICASSP2023)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[16] arXiv:2303.01812 [pdf, other]: Title: Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers

Authors: Heinrich Dinkel, Yongqing Wang, Zhiyong Yan, Junbo Zhang, Yujun Wang

Comments: ICASSP 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2303.01864 [pdf, ps, other]: Title: Spectrogram Inversion for Audio Source Separation via Consistency, Mixing, and Magnitude Constraints

Authors: Paul Magron, Tuomas Virtanen

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:2303.01875 [pdf, other]: Title: Decoding and Visualising Intended Emotion in an Expressive Piano Performance

Authors: Shreyan Chowdhury, Gerhard Widmer

Comments: Extended version of Late-Breaking Demo Session paper accepted at ISMIR 2022 (23rd Int. Society for Music Information Retrieval Conf., Bengaluru, India, 2022)

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2303.01879 [pdf, other]: Title: Low-Complexity Audio Embedding Extractors

Authors: Florian Schmid, Khaled Koutini, Gerhard Widmer

Comments: In Proceedings of the 31st European Signal Processing Conference, EUSIPCO 2023. Source Code available at: this https URL

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2303.01884 [pdf, other]: Title: AutoMatch: A Large-scale Audio Beat Matching Benchmark for Boosting Deep Learning Assistant Video Editing

Authors: Sen Pei, Jingya Yu, Qi Chen, Wozhou He

Comments: 11 pages, 5 figures

Subjects: Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[21] arXiv:2303.02348 [pdf, other]: Title: The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis

Authors: Haoxu Wang, Ming Cheng, Qiang Fu, Ming Li

Comments: Accepted by ICASSP 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[22] arXiv:2303.02396 [pdf, other]: Title: A General Framework for Learning Procedural Audio Models of Environmental Sounds

Authors: Danzel Serrano, Mark Cartwright

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[23] arXiv:2303.02599 [pdf, ps, other]: Title: Hybrid Y-Net Architecture for Singing Voice Separation

Authors: Rashen Fernando, Pamudu Ranasinghe, Udula Ranasinghe, Janaka Wijayakulasooriya, Pantaleon Perera

Comments: Submitted for EUSIPCO23: 5 Pages, 7 figures

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2303.02665 [pdf, other]: Title: Heterogeneous Graph Learning for Acoustic Event Classification

Authors: Amir Shirian, Mona Ahmadian, Krishna Somandepalli, Tanaya Guha

Comments: arXiv admin note: text overlap with arXiv:2207.07935

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[25] arXiv:2303.02673 [pdf, other]: Title: Time-frequency Network for Robust Speaker Recognition

Authors: Jiguo Li, Tianzi Zhang, Xiaobin Liu, Lirong Zheng

Comments: 5pages, 3 figures

Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

[ total of 232 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 226-232 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Mar 2023