Sound

Authors and titles for cs.SD in Feb 2023, skipping first 75

[ total of 179 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | 176-179 ]
[ showing 25 entries per page: fewer | more | all ]

[76] arXiv:2302.02088 (cross-list from cs.CV) [pdf, other]: Title: AV-NeRF: Learning Neural Fields for Real-World Audio-Visual Scene Synthesis

Authors: Susan Liang, Chao Huang, Yapeng Tian, Anurag Kumar, Chenliang Xu

Comments: NeurIPS 2023

Subjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[77] arXiv:2302.02419 (cross-list from cs.CL) [pdf, other]: Title: deep learning of segment-level feature representation for speech emotion recognition in conversations

Authors: Jiachen Luo, Huy Phan, Joshua Reiss

Comments: 6 pages, 4 figures

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[78] arXiv:2302.03124 (cross-list from cs.LG) [pdf, other]: Title: Autodecompose: A generative self-supervised model for semantic decomposition

Authors: Mohammad Reza Bonyadi

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[79] arXiv:2302.03498 (cross-list from cs.CL) [pdf, other]: Title: MAC: A unified framework boosting low resource automatic speech recognition

Authors: Zeping Min, Qian Ge, Zhong Li, Weinan E

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[80] arXiv:2302.03533 (cross-list from cs.CV) [pdf, other]: Title: Revisiting Pre-training in Audio-Visual Learning

Authors: Ruoxuan Feng, Wenke Xia, Di Hu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[81] arXiv:2302.04331 (cross-list from cs.LG) [pdf, other]: Title: Short-Term Memory Convolutions

Authors: Grzegorz Stefański, Krzysztof Arendt, Paweł Daniluk, Bartłomiej Jasik, Artur Szumaczuk

Comments: ICLR 2023

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[82] arXiv:2302.04959 (cross-list from cs.LG) [pdf, other]: Title: Hypernetworks build Implicit Neural Representations of Sounds

Authors: Filip Szatkowski, Karol J. Piczak, Przemysław Spurek, Jacek Tabor, Tomasz Trzciński

Comments: ECML2023

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[83] arXiv:2302.05040 (cross-list from cs.CL) [pdf, other]: Title: PATCorrect: Non-autoregressive Phoneme-augmented Transformer for ASR Error Correction

Authors: Ziji Zhang, Zhehui Wang, Rajesh Kamma, Sharanya Eswaran, Narayanan Sadagopan

Comments: Accepted camera-ready version for INTERSPEECH 2023

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[84] arXiv:2302.06008 (cross-list from cs.CL) [pdf, ps, other]: Title: ASR Bundestag: A Large-Scale political debate dataset in German

Authors: Johannes Wirth, René Peinl

Comments: 13 pages, 2 tables, 4 figures

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[85] arXiv:2302.07560 (cross-list from cs.LG) [pdf, ps, other]: Title: Unsupervised classification to improve the quality of a bird song recording dataset

Authors: Félix Michaud (ISYEB ), Jérôme Sueur (ISYEB ), Maxime Le Cesne (ISYEB ), Sylvain Haupert (ISYEB )

Journal-ref: Ecological Informatics, 2023, pp.101952

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[86] arXiv:2302.08088 (cross-list from cs.CL) [pdf, other]: Title: TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Authors: Yunyang Zeng, Joseph Konan, Shuo Han, David Bick, Muqiao Yang, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

Comments: Accepted at ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[87] arXiv:2302.08102 (cross-list from cs.CL) [pdf, other]: Title: Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition

Authors: Minsu Kim, Hyung-Il Kim, Yong Man Ro

Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[88] arXiv:2302.08607 (cross-list from cs.NE) [pdf, other]: Title: Adaptive Axonal Delays in feedforward spiking neural networks for accurate spoken word recognition

Authors: Pengfei Sun, Ehsan Eqlimi, Yansong Chua, Paul Devos, Dick Botteldooren

Comments: Accepted by ICASSP 2023

Subjects: Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[89] arXiv:2302.08794 (cross-list from cs.HC) [pdf, ps, other]: Title: Build a training interface to install the bat's echolocation skills in humans

Authors: Miyoko Tsumaki, Yu Teshima, Takao Tsuchiya, Kaoru Ashihara, Kohta I. Kobayasi, Shizuko Hiryu

Comments: 4 pages, 3 figures

Subjects: Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[90] arXiv:2302.08950 (cross-list from cs.CL) [pdf, ps, other]: Title: Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

Authors: Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun

Comments: Accepted to Interspeech 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[91] arXiv:2302.09328 (cross-list from cs.MM) [pdf, other]: Title: SSVMR: Saliency-based Self-training for Video-Music Retrieval

Authors: Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yaowei Li, Yuexian Zou

Comments: Accepted by ICASSP 2023

Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[92] arXiv:2302.09723 (cross-list from cs.CL) [pdf, other]: Title: Emphasizing Unseen Words: New Vocabulary Acquisition for End-to-End Speech Recognition

Authors: Leyuan Qu, Cornelius Weber, Stefan Wermter

Comments: Neural Networks, Volume 161, April 2023, Pages 494-504

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[93] arXiv:2302.09856 (cross-list from cs.CL) [pdf, ps, other]: Title: Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition

Authors: Zihan Zhao, Yu Wang, Yanfeng Wang

Comments: Accepted to IEEE ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[94] arXiv:2302.10871 (cross-list from cs.CL) [pdf, other]: Title: Efficient CTC Regularization via Coarse Labels for End-to-End Speech Translation

Authors: Biao Zhang, Barry Haddow, Rico Sennrich

Comments: EACL 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[95] arXiv:2302.10915 (cross-list from cs.LG) [pdf, other]: Title: Conformers are All You Need for Visual Speech Recognition

Authors: Oscar Chang, Hank Liao, Dmitriy Serdyuk, Ankit Shah, Olivier Siohan

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[96] arXiv:2302.11224 (cross-list from cs.CL) [pdf, other]: Title: MADI: Inter-domain Matching and Intra-domain Discrimination for Cross-domain Speech Recognition

Authors: Jiaming Zhou, Shiwan Zhao, Ning Jiang, Guoqing Zhao, Yong Qin

Comments: Accepted to ICASSP 2023

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[97] arXiv:2302.12049 (cross-list from cs.CL) [pdf, other]: Title: Evaluating Automatic Speech Recognition in an Incremental Setting

Authors: Ryan Whetten, Mir Tahsin Imtiaz, Casey Kennington

Comments: 5 pages

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[98] arXiv:2302.12057 (cross-list from cs.CL) [pdf, other]: Title: ProsAudit, a prosodic benchmark for self-supervised speech models

Authors: Maureen de Seyssel, Marvin Lavechin, Hadrien Titeux, Arthur Thomas, Gwendal Virlet, Andrea Santos Revilla, Guillaume Wisniewski, Bogdan Ludusan, Emmanuel Dupoux

Comments: Accepted at Interspeech 2023. 4 pages + references, 1 figure

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[99] arXiv:2302.12829 (cross-list from cs.CL) [pdf, other]: Title: Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Authors: William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe

Comments: 5 pages, 1 figure, accepted at ICASSP 2023; fixed typo and URL in abstract

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[100] arXiv:2302.12921 (cross-list from cs.CL) [pdf, other]: Title: Pre-Finetuning for Few-Shot Emotional Speech Recognition

Authors: Maximillian Chen, Zhou Yu

Comments: 5 pages, 4 figures. Code available at this https URL

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 179 entries: 1-25 | 26-50 | 51-75 | 76-100 | 101-125 | 126-150 | 151-175 | 176-179 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Feb 2023, skipping first 75