Sound

Authors and titles for cs.SD in Dec 2019

[ total of 90 entries: 1-50 | 51-90 ]
[ showing 50 entries per page: fewer | more | all ]

[1] arXiv:1912.00766 [pdf, other]: Title: Three Orthogonal Dimensions for Psychoacoustic Sonification

Authors: Tim Ziemer, Holger Schultheis

Comments: Keywords: Auditory Display, Audition, Noise/acoustics, Sound Design, Interpretability

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS); Neurons and Cognition (q-bio.NC)
[2] arXiv:1912.01219 [pdf, other]: Title: WaveFlow: A Compact Flow-based Model for Raw Audio

Authors: Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

Comments: Published at ICML 2020. Code and pre-trained models: this https URL

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[3] arXiv:1912.01231 [pdf, other]: Title: HI-MIA : A Far-field Text-Dependent Speaker Verification Database and the Baselines

Authors: Xiaoyi Qin, Hui Bu, Ming Li

Comments: Accepted at ICASSP 2020

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[4] arXiv:1912.01852 [pdf, other]: Title: PitchNet: Unsupervised Singing Voice Conversion with Pitch Adversarial Network

Authors: Chengqi Deng, Chengzhu Yu, Heng Lu, Chao Weng, Dong Yu

Comments: Accepted by ICASSP 2020

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[5] arXiv:1912.02461 [pdf, ps, other]: Title: Towards Robust Neural Vocoding for Speech Generation: A Survey

Authors: Po-chun Hsu, Chun-hsuan Wang, Andy T. Liu, Hung-yi Lee

Comments: Submitted to INTERSPEECH 2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[6] arXiv:1912.02522 [pdf, other]: Title: VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge

Authors: Joon Son Chung, Arsha Nagrani, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman

Comments: ISCA Archive

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[7] arXiv:1912.03679 [pdf, other]: Title: A Supervised Speech enhancement Approach with Residual Noise Control for Voice Communication

Authors: Andong Li, Chengshi Zheng, Xiaodong Li

Comments: 5 pages, 2 figures, Submitted to Signal Processing Letters

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[8] arXiv:1912.03884 [pdf, other]: Title: MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing

Authors: Chao-I Tuan, Yuan-Kuei Wu, Hung-yi Lee, Yu Tsao

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[9] arXiv:1912.04761 [pdf, other]: Title: Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization

Authors: Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley

Comments: 11 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[10] arXiv:1912.05124 [pdf, other]: Title: Small-footprint Keyword Spotting with Graph Convolutional Network

Authors: Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei

Comments: Accepted by the IEEE Automatic Speech Recognition and Understanding Workshop(ASRU 2019)

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[11] arXiv:1912.05289 [pdf, ps, other]: Title: Voice Conversion for Whispered Speech Synthesis

Authors: Marius Cotescu, Thomas Drugman, Goeric Huybrechts, Jaime Lorenzo-Trueba, Alexis Moinet

Comments: Submitted to IEEE Signal Processing Letters

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[12] arXiv:1912.05537 [pdf, other]: Title: Encoding Musical Style with Transformer Autoencoders

Authors: Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[13] arXiv:1912.05683 [pdf, other]: Title: Learning to Model Aspects of Hearing Perception Using Neural Loss Functions

Authors: Prateek Verma, Jonathan Berger

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[14] arXiv:1912.06808 [pdf, other]: Title: Environmental Sound Classification with Parallel Temporal-spectral Attention

Authors: Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang

Comments: submitted to INTERSPEECH2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[15] arXiv:1912.08888 [pdf, other]: Title: Scattering in Feedback Delay Networks

Authors: Sebastian J. Schlecht, Emanuël A. P. Habets

Subjects: Sound (cs.SD)
[16] arXiv:1912.10128 [pdf, other]: Title: Learning Singing From Speech

Authors: Liqiang Zhang, Chengzhu Yu, Heng Lu, Chao Weng, Yusong Wu, Xiang Xie, Zijin Li, Dong Yu

Comments: Submitted to ICASSP-2020

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[17] arXiv:1912.10211 [pdf, other]: Title: PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

Authors: Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley

Comments: 14 pages

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[18] arXiv:1912.10292 [pdf, other]: Title: Deep Audio Prior

Authors: Yapeng Tian, Chenliang Xu, Dingzeyu Li

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[19] arXiv:1912.10458 [pdf, other]: Title: Emotion Recognition from Speech

Authors: Kannan Venkataramanan, Haresh Rengaraj Rajamohan

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[20] arXiv:1912.10815 [pdf, ps, other]: Title: Wykorzystanie sztucznej inteligencji do generowania treści muzycznych

Authors: Mateusz Dorobek

Comments: Bachelor Thesis, in Polish

Subjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[21] arXiv:1912.11333 [src]: Title: Audio-based automatic mating success prediction of giant pandas

Authors: WeiRan Yan, MaoLin Tang, Qijun Zhao, Peng Chen, Dunwu Qi, Rong Hou, Zhihe Zhang

Comments: The manuscript needs further revision

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[22] arXiv:1912.11585 [pdf, other]: Title: THUEE system description for NIST 2019 SRE CTS Challenge

Authors: Yi Liu, Tianyu Liang, Can Xu, Xianwei Zhang, Xianhong Chen, Wei-Qiang Zhang, Liang He, Dandan song, Ruyun Li, Yangcheng Wu, Peng Ouyang, Shouyi Yin

Comments: This is the system description of THUEE submitted to NIST SRE 2019

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[23] arXiv:1912.11613 [pdf, other]: Title: Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

Authors: Lu Huang, Gaofeng Cheng, Pengyuan Zhang, Yi Yang, Shumin Xu, Jiasong Sun

Comments: Proceedings of APSIPA Annual Summit and Conference 2019, 18-21 November 2019, Lanzhou, China

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[24] arXiv:1912.11747 [pdf, other]: Title: Score and Lyrics-Free Singing Voice Generation

Authors: Jen-Yu Liu, Yu-Hua Chen, Yin-Cheng Yeh, Yi-Hsuan Yang

Comments: Accepted by International Conference on Computational Creativity (ICCC) 2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[25] arXiv:1912.11984 [pdf, ps, other]: Title: MoEVC: A Mixture-of-experts Voice Conversion System with Sparse Gating Mechanism for Accelerating Online Computation

Authors: Yu-Tao Chang, Yuan-Hong Yang, Yu-Huai Peng, Syu-Siang Wang, Tai-Shih Chi, Yu Tsao, Hsin-Min Wang

Comments: Submitted to ICASSP 2020

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[26] arXiv:1912.12011 [pdf, ps, other]: Title: Cross-scale Attention Model for Acoustic Event Classification

Authors: Xugang Lu, Peng Shen, Sheng Li, Yu Tsao, Hisashi Kawai

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[27] arXiv:1912.12055 [pdf, other]: Title: nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks

Authors: Kin Wai Cheuk, Hans Anderson, Kat Agres, Dorien Herremans

Comments: Accepted In IEEE Access

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[28] arXiv:1912.12602 [pdf, other]: Title: Complex Cepstrum-based Decomposition of Speech for Glottal Source Estimation

Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[29] arXiv:1912.12604 [pdf, other]: Title: Glottal Source Processing: from Analysis to Applications

Authors: Thomas Drugman, Paavo Alku, Abeer Alwan, Bayya Yegnanarayana

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[30] arXiv:1912.12609 [pdf, other]: Title: A Comparative Study of Pitch Extraction Algorithms on a Large Variety of Singing Sounds

Authors: Onur Babacan, Thomas Drugman, Nicolas d'Alessandro, Nathalie Henrich, Thierry Dutoit

Subjects: Sound (cs.SD); Audio and Speech Processing (eess.AS)
[31] arXiv:1912.12825 [pdf, other]: Title: Neural Architecture Search on Acoustic Scene Classification

Authors: Jixiang Li, Chuming Liang, Bo Zhang, Zhao Wang, Fei Xiang, Xiangxiang Chu

Comments: Accepted to Interspeech 2020

Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[32] arXiv:1912.12843 [pdf, other]: Title: Causal-Anticausal Decomposition of Speech using Complex Cepstrum for Glottal Source Estimation

Authors: Thomas Drugman, Baris Bozkurt, Thierry Dutoit

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[33] arXiv:1912.12887 [pdf, other]: Title: Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis

Authors: Thomas Drugman, Alexis Moinet, Thierry Dutoit, Geoffrey Wilfart

Subjects: Sound (cs.SD); Computation and Language (cs.CL); Audio and Speech Processing (eess.AS)
[34] arXiv:1912.00846 (cross-list from cs.LG) [pdf, other]: Title: Attentive Modality Hopping Mechanism for Speech Emotion Recognition

Authors: Seunghyun Yoon, Subhadeep Dey, Hwanhee Lee, Kyomin Jung

Comments: 5 pages, Accepted as a conference paper at ICASSP 2020

Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Machine Learning (stat.ML)
[35] arXiv:1912.00955 (cross-list from cs.CL) [pdf, other]: Title: Dynamic Prosody Generation for Speech Synthesis using Linguistics-Driven Acoustic Embedding Selection

Authors: Shubhi Tyagi, Marco Nicolis, Jonas Rohnke, Thomas Drugman, Jaime Lorenzo-Trueba

Journal-ref: INTERSPEECH 2020: 4407-4411

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[36] arXiv:1912.01728 (cross-list from cs.CL) [pdf, other]: Title: Fast Intent Classification for Spoken Language Understanding

Authors: Akshit Tyagi, Varun Sharma, Rahul Gupta, Lynn Samson, Nan Zhuang, Zihang Wang, Bill Campbell

Comments: Accepted as a conference paper at ICASSP 20

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[37] arXiv:1912.03010 (cross-list from cs.CL) [pdf, other]: Title: Semantic Mask for Transformer based End-to-End Speech Recognition

Authors: Chengyi Wang, Yu Wu, Yujiao Du, Jinyu Li, Shujie Liu, Liang Lu, Shuo Ren, Guoli Ye, Sheng Zhao, Ming Zhou

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[38] arXiv:1912.04487 (cross-list from cs.CV) [pdf, other]: Title: Listen to Look: Action Recognition by Previewing Audio

Authors: Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, Lorenzo Torresani

Comments: Appears in CVPR 2020; Project page: this http URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[39] arXiv:1912.04784 (cross-list from cs.CL) [pdf, ps, other]: Title: A Novel Topology for End-to-end Temporal Classification and Segmentation with Recurrent Neural Network

Authors: Taiyang Zhao

Comments: 4 pages,3 figures

Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[40] arXiv:1912.07011 (cross-list from cs.CV) [pdf, other]: Title: BatVision: Learning to See 3D Spatial Layout with Two Ears

Authors: Jesper Haahr Christensen, Sascha Hornauer, Stella Yu

Subjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[41] arXiv:1912.07050 (cross-list from cs.CL) [pdf, ps, other]: Title: Computational Induction of Prosodic Structure

Authors: Dafydd Gibbon

Comments: 29 pages, 10 figures, code appendix, to appear in "Studies in Prosodic Grammar"

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[42] arXiv:1912.07756 (cross-list from cs.LG) [pdf, ps, other]: Title: Data augmentation approaches for improving animal audio classification

Authors: Loris Nanni, Gianluca Maguolo, Michelangelo Paci

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[43] arXiv:1912.07875 (cross-list from cs.CL) [pdf, ps, other]: Title: Libri-Light: A Benchmark for ASR with Limited or No Supervision

Authors: Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdelrahman Mohamed, Emmanuel Dupoux

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[44] arXiv:1912.08639 (cross-list from cs.CV) [pdf, other]: Title: Detecting Adversarial Attacks On Audiovisual Speech Recognition

Authors: Pingchuan Ma, Stavros Petridis, Maja Pantic

Comments: Accepted to ICASSP 2021

Subjects: Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[45] arXiv:1912.09261 (cross-list from cs.LG) [pdf, ps, other]: Title: Practical applicability of deep neural networks for overlapping speaker separation

Authors: Pieter Appeltans, Jeroen Zegers, Hugo Van hamme

Comments: Interspeech 2019

Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[46] arXiv:1912.10131 (cross-list from cs.MM) [pdf, other]: Title: Leveraging Topics and Audio Features with Multimodal Attention for Audio Visual Scene-Aware Dialog

Authors: Shachi H Kumar, Eda Okur, Saurav Sahay, Jonathan Huang, Lama Nachman

Comments: Presented at the 3rd Visually Grounded Interaction and Language (ViGIL) Workshop, NeurIPS 2019, Vancouver, Canada. arXiv admin note: substantial text overlap with arXiv:1812.08407, arXiv:1912.10132

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[47] arXiv:1912.10915 (cross-list from cs.CL) [pdf, other]: Title: Probing the phonetic and phonological knowledge of tones in Mandarin TTS models

Authors: Jian Zhu

Comments: Submitted to Speech Prosody 2020

Subjects: Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[48] arXiv:1912.11474 (cross-list from cs.CV) [pdf, other]: Title: SoundSpaces: Audio-Visual Navigation in 3D Environments

Authors: Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman

Comments: Accepted to ECCV 2020 (Spotlight). Project page: this http URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[49] arXiv:1912.11684 (cross-list from cs.CV) [pdf, other]: Title: Look, Listen, and Act: Towards Audio-Visual Embodied Navigation

Authors: Chuang Gan, Yiwei Zhang, Jiajun Wu, Boqing Gong, Joshua B. Tenenbaum

Comments: Accepted by ICRA 2020. Project page: this http URL

Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Robotics (cs.RO); Sound (cs.SD); Audio and Speech Processing (eess.AS)
[50] arXiv:1912.12362 (cross-list from cs.MM) [pdf, other]: Title: Structural characterization of musical harmonies

Authors: Maria Rojo González, Simone Santini

Subjects: Multimedia (cs.MM); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)

[ total of 90 entries: 1-50 | 51-90 ]
[ showing 50 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, 2404, contact, help (Access key information)

> cs > cs.SD

Sound

Authors and titles for cs.SD in Dec 2019