Sound

Authors and titles for cs.SD in Oct 2021, skipping first 250

[ total of 324 entries: 1-25 | ... | 176-200 | 201-225 | 226-250 | 251-275 | 276-300 | 301-324 ]
[ showing 25 entries per page: fewer | more | all ]

[251] arXiv:2110.04654 (cross-list from eess.AS) [pdf, other]: Title: Complex Network-Based Approach for Feature Extraction and Classification of Musical Genres

Authors: Matheus Henrique Pimenta-Zanon, Glaucia Maria Bressan, Fabrício Martins Lopes

Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[252] arXiv:2110.04692 (cross-list from eess.AS) [pdf, other]: Title: Poformer: A simple pooling transformer for speaker verification

Authors: Yufeng Ma, Yiwei Ding, Miao Zhao, Yu Zheng, Min Liu, Minqiang Xu

Comments: submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[253] arXiv:2110.04694 (cross-list from eess.AS) [pdf, other]: Title: Multi-Channel End-to-End Neural Diarization with Distributed Microphones

Authors: Shota Horiguchi, Yuki Takashima, Paola Garcia, Shinji Watanabe, Yohei Kawaguchi

Comments: Accepted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[254] arXiv:2110.04775 (cross-list from eess.AS) [pdf, other]: Title: Estimating the confidence of speech spoofing countermeasure

Authors: Xin Wang, Junichi Yamagishi

Comments: Work in progress. Comments are welcome. Accepted by ICASSP2022. Code is available this https URL Not all the comments from anonymous reviewers can be addressed within 4 pages, apologize for that

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Sound (cs.SD)
[255] arXiv:2110.04791 (cross-list from eess.AS) [pdf, other]: Title: Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain

Authors: Zengwei Yao, Wenjie Pei, Fanglin Chen, Guangming Lu, David Zhang

Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[256] arXiv:2110.04850 (cross-list from eess.AS) [pdf, other]: Title: Direct source and early reflections localization using deep deconvolution network under reverberant environment

Authors: Shan Gao, Xihong Wu, Tianshu Qu

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[257] arXiv:2110.04908 (cross-list from eess.AS) [pdf, other]: Title: DITTO: Data-efficient and Fair Targeted Subset Selection for ASR Accent Adaptation

Authors: Suraj Kothawade, Anmol Mekala, Chandra Sekhara D, Mayank Kothyari, Rishabh Iyer, Ganesh Ramakrishnan, Preethi Jyothi

Comments: ACL 2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[258] arXiv:2110.04948 (cross-list from eess.AS) [pdf, other]: Title: Advancing Momentum Pseudo-Labeling with Conformer and Initialization Strategy

Authors: Yosuke Higuchi, Niko Moritz, Jonathan Le Roux, Takaaki Hori

Comments: Submitted to ICASSP2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[259] arXiv:2110.05036 (cross-list from eess.AS) [pdf, other]: Title: Multi-View Self-Attention Based Transformer for Speaker Recognition

Authors: Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

Comments: Paper to appear at ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[260] arXiv:2110.05249 (cross-list from eess.AS) [pdf, other]: Title: A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

Authors: Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Comments: Accepted to ASRU2021

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[261] arXiv:2110.05267 (cross-list from eess.AS) [pdf, other]: Title: Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition

Authors: Yuchen Hu, Nana Hou, Chen Chen, Eng Siong Chng

Comments: 5 pages, 7 figures, Accepted by ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[262] arXiv:2110.05431 (cross-list from eess.AS) [pdf, other]: Title: On the invertibility of a voice privacy system using embedding alignement

Authors: Pierre Champion (MULTISPEECH, LIUM), Thomas Thebaud (LIUM), Gaël Le Lan, Anthony Larcher (LIUM), Denis Jouvet (MULTISPEECH)

Journal-ref: ASRU 2021 - IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2021, Cartagena, Colombia

Subjects: Audio and Speech Processing (eess.AS); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Sound (cs.SD)
[263] arXiv:2110.05632 (cross-list from stat.AP) [pdf, other]: Title: Wind-robust sound event detection and denoising for bioacoustics

Authors: Julius Juodakis, Stephen Marsland

Comments: 34 pages, 5 figures, 2 supplementary figures

Subjects: Applications (stat.AP); Sound (cs.SD); Quantitative Methods (q-bio.QM)
[264] arXiv:2110.05695 (cross-list from eess.AS) [pdf, ps, other]: Title: The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

Authors: Yashish M. Siriwardena, Guilhem Marion, Shihab Shamma

Journal-ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[265] arXiv:2110.05745 (cross-list from eess.AS) [pdf, other]: Title: VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Authors: Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda

Comments: 5 pages, 1 figure, 3 tables, submitted to ICASSP 2022; updated reference information of [33]

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[266] arXiv:2110.05948 (cross-list from eess.SP) [pdf, other]: Title: Denoising Diffusion Gamma Models

Authors: Eliya Nachmani, Robin San Roman, Lior Wolf

Comments: arXiv admin note: substantial text overlap with arXiv:2106.07582

Subjects: Signal Processing (eess.SP); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
[267] arXiv:2110.05994 (cross-list from eess.AS) [pdf, other]: Title: Word Order Does Not Matter For Speech Recognition

Authors: Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[268] arXiv:2110.06126 (cross-list from eess.AS) [pdf, other]: Title: Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

Authors: Ricardo Falcon-Perez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

Comments: 5 pages, 2 figures, 4 tables. Submitted to the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[269] arXiv:2110.06304 (cross-list from eess.AS) [pdf, other]: Title: Generalized Time Domain Velocity Vector

Authors: Srđan Kitić, Jérôme Daniel

Comments: Submitted

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[270] arXiv:2110.06306 (cross-list from eess.AS) [pdf, other]: Title: Fine-grained style control in Transformer-based Text-to-speech Synthesis

Authors: Li-Wei Chen, Alexander Rudnicky

Comments: Accepted in ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[271] arXiv:2110.06309 (cross-list from eess.AS) [pdf, other]: Title: Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition

Authors: Li-Wei Chen, Alexander Rudnicky

Comments: Accepted to ICASSP 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[272] arXiv:2110.06428 (cross-list from eess.AS) [pdf, other]: Title: All-neural beamformer for continuous speech separation

Authors: Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez

Comments: 5 pages, 3 figures, 2 tables

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[273] arXiv:2110.06434 (cross-list from eess.AS) [pdf, other]: Title: DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

Authors: Sergey Nikonorov, Berrak Sisman, Mingyang Zhang, Haizhou Li

Comments: Accepted to ASRU 2021

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[274] arXiv:2110.06440 (cross-list from eess.AS) [pdf, other]: Title: SDR -- Medium Rare with Fast Computations

Authors: Robin Scheibler

Comments: 5 pages, 3 figures, 2 tables. Submitted to ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[275] arXiv:2110.06546 (cross-list from eess.AS) [pdf, other]: Title: A Melody-Unsupervision Model for Singing Voice Synthesis

Authors: Soonbeom Choi, Juhan Nam

Comments: ICASSP 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)

[ total of 324 entries: 1-25 | ... | 176-200 | 201-225 | 226-250 | 251-275 | 276-300 | 301-324 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2406, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Oct 2021, skipping first 250