Sound

Authors and titles for cs.SD in Oct 2022

[ total of 363 entries: 1-10 | 11-20 | 21-30 | 31-40 | ... | 361-363 ]
[ showing 10 entries per page: fewer | more | all ]

[1] arXiv:2210.00169 [pdf, other]: Title: Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

Authors: Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda

Comments: Published in INTERSPEECH 2022

Journal-ref: Proc. Interspeech 2022, 1691-1695

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2210.00721 [pdf, other]: Title: Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Authors: Walter Heymans, Marelie H. Davel, Charl van Heerden

Comments: Final published version available at: Efficient acoustic feature transformation in mismatched environments using a Guided-GAN. Speech Communication, 143, pp.10-20

Journal-ref: Speech Communication, 143, pp.10-20 (2022)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2210.00753 [pdf, other]: Title: Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

Authors: Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang

Comments: Accepted by SLT 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[4] arXiv:2210.01256 [pdf, ps, other]: Title: And what if two musical versions don't share melody, harmony, rhythm, or lyrics ?

Authors: Mathilde Abrassart, Guillaume Doras

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:2210.01353 [pdf, other]: Title: Pay Self-Attention to Audio-Visual Navigation

Authors: Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang

Comments: Main paper (10 pages and 7 figures) and appendix (21 figures and 4 tables). Accepted for publication by BMVC 2022. For data and code, see this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6] arXiv:2210.01448 [pdf, other]: Title: Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

Authors: Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu

Comments: SIGGRAPH Asia 2022 (Journal Track); Project Page: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
[7] arXiv:2210.01703 [pdf, other]: Title: Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining

Authors: Holger Severin Bovbjerg, Zheng-Hua Tan

Comments: To be published at ICASSP2023 Workshop on Self-supervision in Audio, Speech and Beyond, 10th of June 2023, Rhodes, Greece. Copyright (c) 2023 IEEE. 5 pages, 3 figures, 3 tables

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2210.01719 [pdf, other]: Title: Learning Temporal Resolution in Spectrogram for Audio Classification

Authors: Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

Comments: Accepted by the 38th Annual AAAI Conference on Artificial Intelligence

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[9] arXiv:2210.02287 [pdf, ps, other]: Title: TC-SKNet with GridMask for Low-complexity Classification of Acoustic scene

Authors: Luyuan Xie, Yan Zhong, Lin Yang, Zhaoyu Yan, Zhonghai Wu, Junjie Wang

Comments: Accepted to APSIPA ASC 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:2210.02437 [pdf, other]: Title: ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild

Authors: Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch, Kong Aik Lee

Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)

[ total of 363 entries: 1-10 | 11-20 | 21-30 | 31-40 | ... | 361-363 ]
[ showing 10 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Oct 2022