Sound

Authors and titles for cs.SD in Oct 2022

[ total of 363 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 351-363 ]
[ showing 25 entries per page: fewer | more | all ]

[1] arXiv:2210.00169 [pdf, other]: Title: Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

Authors: Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda

Comments: Published in INTERSPEECH 2022

Journal-ref: Proc. Interspeech 2022, 1691-1695

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[2] arXiv:2210.00721 [pdf, other]: Title: Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

Authors: Walter Heymans, Marelie H. Davel, Charl van Heerden

Comments: Final published version available at: Efficient acoustic feature transformation in mismatched environments using a Guided-GAN. Speech Communication, 143, pp.10-20

Journal-ref: Speech Communication, 143, pp.10-20 (2022)

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:2210.00753 [pdf, other]: Title: Push-Pull: Characterizing the Adversarial Robustness for Audio-Visual Active Speaker Detection

Authors: Xuanjun Chen, Haibin Wu, Helen Meng, Hung-yi Lee, Jyh-Shing Roger Jang

Comments: Accepted by SLT 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[4] arXiv:2210.01256 [pdf, ps, other]: Title: And what if two musical versions don't share melody, harmony, rhythm, or lyrics ?

Authors: Mathilde Abrassart, Guillaume Doras

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[5] arXiv:2210.01353 [pdf, other]: Title: Pay Self-Attention to Audio-Visual Navigation

Authors: Yinfeng Yu, Lele Cao, Fuchun Sun, Xiaohong Liu, Liejun Wang

Comments: Main paper (10 pages and 7 figures) and appendix (21 figures and 4 tables). Accepted for publication by BMVC 2022. For data and code, see this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[6] arXiv:2210.01448 [pdf, other]: Title: Rhythmic Gesticulator: Rhythm-Aware Co-Speech Gesture Synthesis with Hierarchical Neural Embeddings

Authors: Tenglong Ao, Qingzhe Gao, Yuke Lou, Baoquan Chen, Libin Liu

Comments: SIGGRAPH Asia 2022 (Journal Track); Project Page: this https URL

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Audio and Speech Processing (eess.AS)
[7] arXiv:2210.01703 [pdf, other]: Title: Improving Label-Deficient Keyword Spotting Through Self-Supervised Pretraining

Authors: Holger Severin Bovbjerg, Zheng-Hua Tan

Comments: To be published at ICASSP2023 Workshop on Self-supervision in Audio, Speech and Beyond, 10th of June 2023, Rhodes, Greece. Copyright (c) 2023 IEEE. 5 pages, 3 figures, 3 tables

Subjects: Sound (cs.SD); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8] arXiv:2210.01719 [pdf, other]: Title: Learning Temporal Resolution in Spectrogram for Audio Classification

Authors: Haohe Liu, Xubo Liu, Qiuqiang Kong, Wenwu Wang, Mark D. Plumbley

Comments: Accepted by the 38th Annual AAAI Conference on Artificial Intelligence

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Multimedia (cs.MM); Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
[9] arXiv:2210.02287 [pdf, ps, other]: Title: TC-SKNet with GridMask for Low-complexity Classification of Acoustic scene

Authors: Luyuan Xie, Yan Zhong, Lin Yang, Zhaoyu Yan, Zhonghai Wu, Junjie Wang

Comments: Accepted to APSIPA ASC 2022

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[10] arXiv:2210.02437 [pdf, other]: Title: ASVspoof 2021: Towards Spoofed and Deepfake Speech Detection in the Wild

Authors: Xuechen Liu, Xin Wang, Md Sahidullah, Jose Patino, Héctor Delgado, Tomi Kinnunen, Massimiliano Todisco, Junichi Yamagishi, Nicholas Evans, Andreas Nautsch, Kong Aik Lee

Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[11] arXiv:2210.02642 [pdf, other]: Title: Feasibility on Detecting Door Slamming towards Monitoring Early Signs of Domestic Violence

Authors: Osian Morgan, Hakan Kayan, Charith Perera

Comments: In Proceedings of the 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI) 2022

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
[12] arXiv:2210.02731 [src]: Title: PSVRF: Learning to restore Pitch-Shifted Voice without reference

Authors: Yangfu Li, Xiaodan Lin, Jiaxin Yang

Comments: Have some errors

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[13] arXiv:2210.02746 [pdf, other]: Title: The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection

Authors: Daniele Mari, Federica Latora, Simone Milani

Comments: Accepted at WIFS 2022

Subjects: Sound (cs.SD); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14] arXiv:2210.02829 [pdf, other]: Title: Melody Infilling with User-Provided Structural Context

Authors: Chih-Pin Tan, Alvin W.Y. Su, Yi-Hsuan Yang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:2210.02904 [pdf, other]: Title: WakeUpNet: A Mobile-Transformer based Framework for End-to-End Streaming Voice Trigger

Authors: Zixing Zhang, Thorin Farnsworth, Senling Lin, Salah Karout

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[16] arXiv:2210.03027 [pdf, other]: Title: AnimeTAB: A new guitar tablature dataset of anime and game music

Authors: Yuecheng Zhou, Yaolong Ju, Lingyun Xie

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[17] arXiv:2210.03255 [pdf, other]: Title: Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition

Authors: Somshubra Majumdar, Shantanu Acharya, Vitaly Lavrukhin, Boris Ginsburg

Comments: To appear in Proc. SLT 2022, Jan 09-12, 2023, Doha, Qatar

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[18] arXiv:2210.03360 [pdf, other]: Title: The PerspectiveLiberator -- an upmixing 6DoF rendering plugin for single-perspective Ambisonic room impulse responses

Authors: Kaspar Müller, Franz Zotter

Comments: 4 pages, submitted to conference: DAGA 2021, Vienna, Austria, 2021

Journal-ref: Fortschritte der Akustik - DAGA 2021, Vienna, Austria, 2021, vol. 47, pp. 306-309

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[19] arXiv:2210.03363 [pdf, other]: Title: Model-based estimation of in-car-communication feedback applied to speech zone detection

Authors: Kaspar Müller, Simon Doclo, Jan Østergaard, Tobias Wolff

Comments: 5 pages, submitted to International Workshop on Acoustic Signal Enhancement (IWAENC), Bamberg, Germany, 2022

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[20] arXiv:2210.03538 [pdf, other]: Title: An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era

Authors: Andreas Triantafyllopoulos, Björn W. Schuller, Gökçe İymen, Metin Sezgin, Xiangheng He, Zijiang Yang, Panagiotis Tzirakis, Shuo Liu, Silvan Mertes, Elisabeth André, Ruibo Fu, Jianhua Tao

Comments: Submitted to the Proceedings of IEEE

Subjects: Sound (cs.SD); Machine Learning (cs.LG)
[21] arXiv:2210.03799 [pdf, other]: Title: Supervised and Unsupervised Learning of Audio Representations for Music Understanding

Authors: Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, Fabien Gouyon, Andreas F. Ehmann

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[22] arXiv:2210.04062 [pdf, other]: Title: CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning

Authors: Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li

Comments: Accepted by Interspeech 2023

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[23] arXiv:2210.05037 [pdf, other]: Title: Automated Audio Captioning via Fusion of Low- and High- Dimensional Features

Authors: Jianyuan Sun, Xubo Liu, Xinhao Mei, Mark D. Plumbley, Volkan Kilic, Wenwu Wang

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:2210.05076 [pdf, other]: Title: ConchShell: A Generative Adversarial Networks that Turns Pictures into Piano Music

Authors: Wanpeng Fan, Yuanzhi Su, Yuxin Huang

Comments: 5 pages

Subjects: Sound (cs.SD); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)
[25] arXiv:2210.05092 [pdf, other]: Title: The DKU-Tencent System for the VoxCeleb Speaker Recognition Challenge 2022

Authors: Xiaoyi Qin, Na Li, Yuke Lin, Yiwei Ding, Chao Weng, Dan Su, Ming Li

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 363 entries: 1-25 | 26-50 | 51-75 | 76-100 | ... | 351-363 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Oct 2022