Sound

Authors and titles for cs.SD in Jun 2022, skipping first 150

[ total of 221 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-221 ]
[ showing 50 entries per page: fewer | more | all ]

[151] arXiv:2206.00951 (cross-list from eess.AS) [pdf, other]: Title: Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations

Authors: Chang Liu, Zhen-Hua Ling, Ling-Hui Chen

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[152] arXiv:2206.00970 (cross-list from eess.AS) [pdf, other]: Title: Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment

Authors: Shanshan Wang, Archontis Politis, Annamaria Mesaros, Tuomas Virtanen

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[153] arXiv:2206.01205 (cross-list from eess.AS) [pdf, other]: Title: Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages

Authors: Kavitha Raju, Anjaly V, Ryan Lish, Joel Mathew

Comments: See dataset at this https URL

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[154] arXiv:2206.01948 (cross-list from eess.AS) [pdf, other]: Title: STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

Authors: Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[155] arXiv:2206.02124 (cross-list from eess.AS) [pdf, other]: Title: Sampling Frequency Independent Dialogue Separation

Authors: Jouni Paulus, Matteo Torcoli

Comments: accepted into EUSIPCO 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[156] arXiv:2206.02125 (cross-list from eess.AS) [pdf, other]: Title: Geometrically-Motivated Primary-Ambient Decomposition With Center-Channel Extraction

Authors: Jouni Paulus, Matteo Torcoli

Comments: accepted into EUSIPCO 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[157] arXiv:2206.02147 (cross-list from eess.AS) [pdf, other]: Title: Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech

Authors: Ziyue Jiang, Zhe Su, Zhou Zhao, Qian Yang, Yi Ren, Jinglin Liu, Zhenhui Ye

Comments: v3: fix the introduction for the concurrent similar work of Neural Lexicon Reader (arXiv:2110.09698)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[158] arXiv:2206.02432 (cross-list from eess.AS) [pdf, other]: Title: Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors

Authors: Shota Horiguchi, Shinji Watanabe, Paola Garcia, Yuki Takashima, Yohei Kawaguchi

Comments: Accepted to IEEE/ACM TASLP

Journal-ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 706-720, 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[159] arXiv:2206.02512 (cross-list from eess.AS) [pdf, other]: Title: UTTS: Unsupervised TTS with Conditional Disentangled Sequential Variational Auto-encoder

Authors: Jiachen Lian, Chunlei Zhang, Gopala Krishna Anumanchipalli, Dong Yu

Comments: Under Review

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[160] arXiv:2206.02639 (cross-list from eess.AS) [pdf, other]: Title: Continuous-Time Analog Filters for Audio Edge Intelligence: Review on Circuit Designs

Authors: Kwantae Kim, Shih-Chii Liu

Comments: 17 pages, 19 figures, 1 table

Subjects: Audio and Speech Processing (eess.AS); Hardware Architecture (cs.AR); Sound (cs.SD)
[161] arXiv:2206.03104 (cross-list from stat.AP) [pdf, other]: Title: Crossing the Linguistic Causeway: A Binational Approach for Translating Soundscape Attributes to Bahasa Melayu

Authors: Bhan Lam, Julia Chieng, Karn N. Watcharasupat, Kenneth Ooi, Zhen-Ting Ong, Joo Young Hong, Woon-Seng Gan

Comments: Published in Applied Acoustics in the Special Issue on Soundscape Attributes Translation: Current Projects and Challenges

Journal-ref: Appl. Acoust., vol. 199, p. 108976, Oct. 2022

Subjects: Applications (stat.AP); Sound (cs.SD)
[162] arXiv:2206.03400 (cross-list from eess.AS) [pdf, ps, other]: Title: The Influence of Dataset Partitioning on Dysfluency Detection Systems

Authors: Sebastian P. Bayerl, Dominik Wagner, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

Comments: Accepted at the 25th International Conference on Text, Speech and Dialogue (TSD 2022)

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[163] arXiv:2206.04305 (cross-list from eess.AS) [pdf, other]: Title: Context-based out-of-vocabulary word recovery for ASR systems in Indian languages

Authors: Arun Baby, Saranya Vinnaitherthan, Akhil Kerhalkar, Pranav Jawale, Sharath Adavanne, Nagaraj Adiga

Comments: 12 pages

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[164] arXiv:2206.04850 (cross-list from eess.AS) [pdf, other]: Title: Feature-informed Embedding Space Regularization For Audio Classification

Authors: Yun-Ning Hung, Alexander Lerch

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[165] arXiv:2206.05462 (cross-list from eess.AS) [pdf, other]: Title: Svadhyaya system for the Second Diagnosing COVID-19 using Acoustics Challenge 2021

Authors: Deepak Mittal, Amir H. Poorjam, Debottam Dutta, Debarpan Bhattacharya, Zemin Yu, Sriram Ganapathy, Maneesh Singh

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[166] arXiv:2206.05606 (cross-list from eess.AS) [pdf, other]: Title: Signal-informed DNN-based DOA Estimation combining an External Microphone and GCC-PHAT Features

Authors: Ulrik Kowalk, Simon Doclo, Joerg Bitzer

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[167] arXiv:2206.06192 (cross-list from eess.AS) [pdf, ps, other]: Title: Toward Zero Oracle Word Error Rate on the Switchboard Benchmark

Authors: Arlo Faria, Adam Janin, Korbinian Riedhammer, Sidhi Adkoli

Comments: Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[168] arXiv:2206.06208 (cross-list from eess.AS) [pdf, ps, other]: Title: Automated Evaluation of Standardized Dementia Screening Tests

Authors: Franziska Braun, Markus Förstel, Bastian Oppermann, Andreas Erzigkeit, Thomas Hillemacher, Hartmut Lehfeld, Korbinian Riedhammer

Comments: Submitted to Interspeech 2022. arXiv admin note: text overlap with arXiv:2206.05018

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[169] arXiv:2206.07430 (cross-list from eess.AS) [pdf, ps, other]: Title: Residual Language Model for End-to-end Speech Recognition

Authors: Emiru Tsunoo, Yosuke Kashiwagi, Chaitanya Narisetty, Shinji Watanabe

Comments: Accepted for Interspeech2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[170] arXiv:2206.07569 (cross-list from eess.AS) [pdf, other]: Title: End-to-End Voice Conversion with Information Perturbation

Authors: Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[171] arXiv:2206.07917 (cross-list from eess.AS) [pdf, other]: Title: To Dereverb Or Not to Dereverb? Perceptual Studies On Real-Time Dereverberation Targets

Authors: Jean-Marc Valin, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Arvindh Krishnaswamy

Comments: 5 pages

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[172] arXiv:2206.07931 (cross-list from eess.AS) [pdf, other]: Title: DRAFT: A Novel Framework to Reduce Domain Shifting in Self-supervised Learning and Its Application to Children's ASR

Authors: Ruchao Fan, Abeer Alwan

Comments: Accepted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[173] arXiv:2206.08058 (cross-list from eess.AS) [pdf, other]: Title: Nonwords Pronunciation Classification in Language Development Tests for Preschool Children

Authors: Ilja Baumann, Dominik Wagner, Sebastian Bayerl, Tobias Bocklet

Comments: Accepted at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[174] arXiv:2206.08174 (cross-list from eess.AS) [pdf, other]: Title: Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Authors: Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura

Comments: 5 pages, 2 figures, 3 tables Submitted to Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[175] arXiv:2206.08525 (cross-list from eess.AS) [pdf, other]: Title: Simultaneous Speech Extraction for Multiple Target Speakers under the Meeting Scenarios

Authors: Bang Zeng, Hongbing Suo, Yulong Wan, Ming Li

Comments: 13 pages, 3 figures, Accepted by NCMMSC2023

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[176] arXiv:2206.09072 (cross-list from eess.AS) [pdf, other]: Title: Semi-supervised Time Domain Target Speaker Extraction with Attention

Authors: Zhepei Wang, Ritwik Giri, Shrikant Venkataramani, Umut Isik, Jean-Marc Valin, Paris Smaragdis, Mike Goodwin, Arvindh Krishnaswamy

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[177] arXiv:2206.09102 (cross-list from eess.AS) [pdf, other]: Title: Decoupled Federated Learning for ASR with Non-IID Data

Authors: Han Zhu, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Comments: Accepted by Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Distributed, Parallel, and Cluster Computing (cs.DC); Sound (cs.SD)
[178] arXiv:2206.09396 (cross-list from eess.AS) [pdf, other]: Title: Transfer Learning for Robust Low-Resource Children's Speech ASR with Transformers and Source-Filter Warping

Authors: Jenthe Thienpondt, Kris Demuynck

Comments: proceedings of INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[179] arXiv:2206.09507 (cross-list from eess.AS) [pdf, other]: Title: Resource-Efficient Separation Transformer

Authors: Luca Della Libera, Cem Subakan, Mirco Ravanelli, Samuele Cornell, Frédéric Lepoutre, François Grondin

Comments: Accepted to ICASSP 2024

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[180] arXiv:2206.09523 (cross-list from eess.AS) [pdf, other]: Title: Towards Trustworthy Edge Intelligence: Insights from Voice-Activated Services

Authors: W.T. Hutiri, A.Y. Ding

Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Sound (cs.SD)
[181] arXiv:2206.09556 (cross-list from eess.AS) [pdf, other]: Title: An Empirical Analysis on the Vulnerabilities of End-to-End Speech Segregation Models

Authors: Rahil Parikh, Gaspar Rochette, Carol Espy-Wilson, Shihab Shamma

Comments: Accepted at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP)
[182] arXiv:2206.09783 (cross-list from eess.AS) [pdf, other]: Title: Boosting Cross-Domain Speech Recognition with Self-Supervision

Authors: Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan

Comments: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 2023

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[183] arXiv:2206.11000 (cross-list from eess.AS) [pdf, other]: Title: A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement

Authors: Or Tal, Moshe Mandel, Felix Kreuk, Yossi Adi

Comments: Published @ Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[184] arXiv:2206.11045 (cross-list from eess.AS) [pdf, other]: Title: COVYT: Introducing the Coronavirus YouTube and TikTok speech dataset featuring the same speakers with and without infection

Authors: Andreas Triantafyllopoulos, Anastasia Semertzidou, Meishu Song, Florian B. Pokorny, Björn W. Schuller

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[185] arXiv:2206.11181 (cross-list from eess.AS) [pdf, other]: Title: On the Role of Spatial, Spectral, and Temporal Processing for DNN-based Non-linear Multi-channel Speech Enhancement

Authors: Kristina Tesch, Nils-Hendrik Mohrmann, Timo Gerkmann

Comments: Accepted at Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[186] arXiv:2206.11558 (cross-list from eess.AS) [pdf, other]: Title: Adversarial Multi-Task Learning for Disentangling Timbre and Pitch in Singing Voice Synthesis

Authors: Tae-Woo Kim, Min-Su Kang, Gyeong-Hoon Lee

Comments: Accepted to INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[187] arXiv:2206.11640 (cross-list from eess.AS) [pdf, other]: Title: Speaker-Independent Microphone Identification in Noisy Conditions

Authors: Antonio Giganti, Luca Cuccovillo, Paolo Bestagini, Patrick Aichroth, Stefano Tubaro

Journal-ref: in European Signal Processing Conference (EUSIPCO), Belgrade, Serbia, 2022, pp. 1047-1051

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[188] arXiv:2206.11703 (cross-list from eess.AS) [pdf, other]: Title: Efficient Transformer-based Speech Enhancement Using Long Frames and STFT Magnitudes

Authors: Danilo de Oliveira, Tal Peer, Timo Gerkmann

Comments: Accepted at Interspeech 2022

Journal-ref: Proc. Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[189] arXiv:2206.12040 (cross-list from eess.AS) [pdf, other]: Title: End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue

Authors: Kentaro Mitsui, Tianyu Zhao, Kei Sawada, Yukiya Hono, Yoshihiko Nankaku, Keiichi Tokuda

Comments: 5 pages, 3 figures, accepted for INTERSPEECH 2022. Audio samples: this https URL

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[190] arXiv:2206.12045 (cross-list from eess.AS) [pdf, other]: Title: Confidence Score Based Conformer Speaker Adaptation for Speech Recognition

Authors: Jiajun Deng, Xurong Xie, Tianzi Wang, Mingyu Cui, Boyang Xue, Zengrui Jin, Mengzhe Geng, Guinan Li, Xunying Liu, Helen Meng

Comments: It's accepted to INTERSPEECH 2022. arXiv admin note: text overlap with arXiv:2206.11596

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[191] arXiv:2206.12059 (cross-list from eess.AS) [pdf, ps, other]: Title: Data Augmentation and Squeeze-and-Excitation Network on Multiple Dimension for Sound Event Localization and Detection in Real Scenes

Authors: Byeong-Yun Ko, Hyeonuk Nam, Seong-Hu Kim, Deokki Min, Seung-Deok Choi, Yong-Hwa Park

Comments: Technical Report submitted for DCASE2022 Challenge Task3

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[192] arXiv:2206.12283 (cross-list from eess.AS) [pdf, other]: Title: Open-source objective-oriented framework for head-related transfer function

Authors: Adam Szwajcowski

Comments: Not submitted anywhere in the current form

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[193] arXiv:2206.12285 (cross-list from eess.AS) [pdf, other]: Title: Speech Quality Assessment through MOS using Non-Matching References

Authors: Pranay Manocha, Anurag Kumar

Comments: To Appear, Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[194] arXiv:2206.12297 (cross-list from eess.AS) [pdf, other]: Title: SAQAM: Spatial Audio Quality Assessment Metric

Authors: Pranay Manocha, Anurag Kumar, Buye Xu, Anjali Menon, Israel D. Gebru, Vamsi K. Ithapu, Paul Calamia

Comments: To Appear, Interspeech 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[195] arXiv:2206.12489 (cross-list from eess.AS) [pdf, other]: Title: Predicting within and across language phoneme recognition performance of self-supervised learning speech pre-trained models

Authors: Hang Ji, Tanvina Patel, Odette Scharenborg

Comments: Submitted to INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[196] arXiv:2206.12774 (cross-list from eess.AS) [pdf, other]: Title: Meta Auxiliary Learning for Low-resource Spoken Language Understanding

Authors: Yingying Gao, Junlan Feng, Chao Deng, Shilei Zhang

Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[197] arXiv:2206.12857 (cross-list from eess.AS) [pdf, other]: Title: Transport-Oriented Feature Aggregation for Speaker Embedding Learning

Authors: Yusheng Tian, Jingyu Li, Tan Lee

Comments: Accepted for presentation at INTERSPEECH 2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[198] arXiv:2206.13014 (cross-list from eess.AS) [pdf, other]: Title: Joint Optimization of Sampling Rate Offsets Based on Entire Signal Relationship Among Distributed Microphones

Authors: Yoshiki Masuyama, Kouei Yamaoka, Nobutaka Ono

Comments: 5 pages, 2 figures,accepted by Interspeech2022

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[199] arXiv:2206.13044 (cross-list from eess.AS) [pdf, other]: Title: Extended U-Net for Speaker Verification in Noisy Environments

Authors: Ju-ho Kim, Jungwoo Heo, Hye-jin Shim, Ha-Jin Yu

Comments: 5 pages, 2 figures, 4 tables, accepted to 2022 Interspeech as a conference paper

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
[200] arXiv:2206.13066 (cross-list from eess.AS) [pdf, other]: Title: Detection of Doctored Speech: Towards an End-to-End Parametric Learn-able Filter Approach

Authors: Rohit Arora

Comments: arXiv admin note: text overlap with arXiv:1904.05441 by other authors

Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)

[ total of 221 entries: 1-50 | 51-100 | 101-150 | 151-200 | 201-221 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2405, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Jun 2022, skipping first 150