Audio and Speech Processing
Authors and titles for eess.AS in Dec 2019
[ total of 110 entries: 1-50 | 51-100 | 101-110 ][ showing 50 entries per page: fewer | more | all ]
- [1] arXiv:1912.00938 [pdf, ps, other]
-
Title: Speaker detection in the wild: Lessons learned from JSALT 2019Authors: Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim DehakComments: Submitted to ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [2] arXiv:1912.01167 [pdf, other]
-
Title: High-quality Speech Synthesis Using Super-resolution Mel-SpectrogramSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [3] arXiv:1912.01679 [pdf, other]
-
Title: Deep Contextualized Acoustic Representations For Semi-Supervised Speech RecognitionComments: Accepted to ICASSP 2020 (oral)Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [4] arXiv:1912.01777 [pdf, other]
-
Title: Integrating Knowledge into End-to-End Speech Recognition from External Text-Only DataComments: Submitted TASLPSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [5] arXiv:1912.02591 [pdf, other]
-
Title: Investigating U-Nets with various Intermediate Blocks for Spectrogram-based Singing Voice SeparationComments: 8 pages 4 tables 6 figures, accepted to ISMIR 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Multimedia (cs.MM); Sound (cs.SD); Machine Learning (stat.ML)
- [6] arXiv:1912.02606 [pdf, other]
-
Title: Predominant Musical Instrument Classification based on Spectral FeaturesComments: Appeared in Proceedings of SPIN 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [7] arXiv:1912.02608 [pdf, other]
-
Title: SEEF-ALDR: A Speaker Embedding Enhancement Framework via Adversarial Learning based Disentangled RepresentationComments: 12 pages, 4 figures, Accepted by ACSAC 2020Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [8] arXiv:1912.02610 [pdf, other]
-
Title: Bimodal Speech Emotion Recognition Using Pre-Trained Language ModelsComments: Life-Long Learning for Spoken Language Systems ASRU 2019Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [9] arXiv:1912.02613 [pdf, other]
-
Title: Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational AutoencodersComments: Accepted to ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [10] arXiv:1912.02615 [pdf, other]
-
Title: Audiovisual Transformer Architectures for Large-Scale Classification and Synchronization of Weakly Labeled Audio EventsJournal-ref: Proceedings of the 27th ACM International Conference on Multimedia (MM '19). ACM, New York, NY, USA, 1961-1969Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [11] arXiv:1912.02671 [pdf, other]
-
Title: Audio-Visual Target Speaker Enhancement on Multi-Talker Environment using Event-Driven CamerasComments: Accepted at ISCAS 2021Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
- [12] arXiv:1912.02958 [pdf, other]
-
Title: Synchronous Transformers for End-to-End Speech RecognitionComments: Accepted by ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG)
- [13] arXiv:1912.03363 [pdf, other]
-
Title: Audio-attention discriminative language model for ASR rescoringComments: 4 pages, 1 figure, Accepted at ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [14] arXiv:1912.03627 [pdf, ps, other]
-
Title: A Multi Purpose and Large Scale Speech Corpus in Persian and English for Speaker and Speech Recognition: the DeepMine DatabaseSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [15] arXiv:1912.04067 [pdf, other]
-
Title: Visualizing Deep Neural Networks for Speech Recognition with Learned Topographic Filter MapsComments: Accepted for 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLPSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [16] arXiv:1912.04370 [pdf, other]
-
Title: Cross-Language Aphasia Detection using Optimal Transport Domain AdaptationAuthors: Aparna Balagopalan, Jekaterina Novikova, Matthew B. A. McDermott, Bret Nestor, Tristan Naumann, Marzyeh GhassemiComments: Accepted to ML4H at NeurIPS 2019Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [17] arXiv:1912.04381 [pdf, ps, other]
-
Title: A Dataset for measuring reading levels in India at scaleComments: 5 pages, 3 figures, 3 Tables, Paper accepted to ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Computers and Society (cs.CY); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [18] arXiv:1912.04700 [pdf, ps, other]
-
Title: Development and Evaluation of Video Recordings for the OLSA Matrix Sentence TestAuthors: Gerard Llorach, Frederike Kirschner, Giso Grimm, Melanie A. Zokoll, Kirsten C. Wagener, Volker HohmannComments: 10 pages, 9 figuresSubjects: Audio and Speech Processing (eess.AS); Image and Video Processing (eess.IV)
- [19] arXiv:1912.04844 [pdf, other]
-
Title: Quantifying the Chaos Level of Infants' Environment via Unsupervised LearningSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG)
- [20] arXiv:1912.04979 [pdf, other]
-
Title: Advances in Online Audio-Visual Meeting TranscriptionAuthors: Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan ZhouComments: To appear in Proc. IEEE ASRU Workshop 2019Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Image and Video Processing (eess.IV)
- [21] arXiv:1912.05038 [pdf, other]
-
Title: Cooperative Audio Source Separation and Enhancement Using Distributed Microphone Arrays and Wearable DevicesComments: To appear at CAMSAP 2019Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [22] arXiv:1912.05043 [pdf, other]
-
Title: Motion-Tolerant Beamforming with Deformable Microphone ArraysComments: Presented at WASPAA 2019Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [23] arXiv:1912.05472 [pdf, ps, other]
-
Title: Audiogmenter: a MATLAB Toolbox for Audio Data AugmentationSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
- [24] arXiv:1912.05533 [pdf, ps, other]
-
Title: SpecAugment on Large Scale DatasetsAuthors: Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui WuComments: 5 pages, 3 tables; submitted to ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [25] arXiv:1912.05869 [pdf, other]
-
Title: On Neural Phone Recognition of Mixed-Source ECoG SignalsAuthors: Ahmed Hussen Abdelaziz, Shuo-Yiin Chang, Nelson Morgan, Erik Edwards, Dorothea Kolossa, Dan Ellis, David A. Moses, Edward F. ChangComments: 5 pages, showing algorithms, results and references from our collaboration during a 2017 postdoc stay of the first authorSubjects: Audio and Speech Processing (eess.AS); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Neurons and Cognition (q-bio.NC)
- [26] arXiv:1912.05881 [pdf, other]
-
Title: Singing Synthesis: with a little help from my attentionComments: Submitted to Interspeech 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [27] arXiv:1912.05920 [pdf, other]
-
Title: Measuring Mother-Infant Emotions By Audio SensingSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [28] arXiv:1912.05946 [src]
-
Title: Leveraging End-to-End Speech Recognition with Neural Architecture SearchComments: A large part of the document needs to be reviewed to meet current standards in the Automatic Speech RecognitionJournal-ref: IJSER, vol 10, Issue 11, 2019, pp 1113-1119Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD)
- [29] arXiv:1912.06311 [pdf, ps, other]
-
Title: Short-duration Speaker Verification (SdSV) Challenge 2021: the Challenge Evaluation PlanSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [30] arXiv:1912.06813 [pdf, other]
-
Title: Voice Transformer Network: Sequence-to-Sequence Voice Conversion Using Transformer with Text-to-Speech PretrainingComments: Preprint. Work in progressSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
- [31] arXiv:1912.08462 [pdf, other]
-
Title: End-to-end training of time domain audio separation and recognitionAuthors: Thilo von Neumann, Keisuke Kinoshita, Lukas Drude, Christoph Boeddeker, Marc Delcroix, Tomohiro Nakatani, Reinhold Haeb-UmbachComments: 5 pages, 1 figure, to appear in ICASSP 2020Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [32] arXiv:1912.09003 [pdf, ps, other]
-
Title: LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast ChallengeSubjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [33] arXiv:1912.09251 [pdf, other]
-
Title: Personalization of End-to-end Speech Recognition On Mobile Devices For Named EntitiesAuthors: Khe Chai Sim, Françoise Beaufays, Arnaud Benard, Dhruv Guliani, Andreas Kabel, Nikhil Khare, Tamar Lucassen, Petr Zadrazil, Harry Zhang, Leif Johnson, Giovanni Motta, Lillian ZhouSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [34] arXiv:1912.10026 [pdf, other]
-
Title: Calibration and reference simulations for the auditory periphery model of Verhulst et al. 2018 version 1.2Comments: In response to the email from ArXiv about the previous submission of this document (on 11 Dec 2019, to alejandro.osses@ugent.be, arXiv #295093), the new document is now self-contained and adds new information, as highlighted in the abstract of this document. Please do not hesitate in contacting me back for any further detail. AlejandroSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [35] arXiv:1912.10442 [pdf, ps, other]
-
Title: End-Point Detection with State Transition Model based on Chunk-Wise ClassificationSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
- [36] arXiv:1912.10647 [pdf, other]
-
Title: Mixture of Inference Networks for VAE-based Audio-visual Speech EnhancementComments: IEEE Transactions on Signal ProcessingSubjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
- [37] arXiv:1912.11040 [pdf, ps, other]
-
Title: end-to-end training of a large vocabulary end-to-end speech recognition systemAuthors: Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya GowdaComments: Accepted and presented at the ASRU 2019 conferenceSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP); Machine Learning (stat.ML)
- [38] arXiv:1912.11041 [pdf, ps, other]
-
Title: power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognitionComments: Accepted and presented at the ASRU 2019 conferenceSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [39] arXiv:1912.11151 [pdf, other]
-
Title: A Cycle-GAN Approach to Model Natural Perturbations in Speech for ASR ApplicationsComments: 7 pages, 3 figures, ICASSP-2019Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
- [40] arXiv:1912.11547 [pdf, other]
-
Title: Learning Transferable Features for Speech Emotion RecognitionComments: ACM-MM'17, October 23-27, 2017Journal-ref: Proceedings of the on Thematic Workshops of ACM Multimedia 2017. ACM, 2017. Pages 529-536Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
- [41] arXiv:1912.11781 [pdf, ps, other]
-
Title: Multi-Source Direction-of-Arrival Estimation Using Improved Estimation Consistency MethodAuthors: Rohith Mars (1), Hiroyuki Ehara (2), Srikanth Nagisetty (1), Chong Soon Lim (1) ((1) Panasonic R&D Center, Singapore, (2) Panasonic Corporation, Japan)Subjects: Audio and Speech Processing (eess.AS)
- [42] arXiv:1912.11793 [pdf, ps, other]
-
Title: Attention-based ASR with Lightweight and Dynamic ConvolutionsComments: ICASSP 2020Subjects: Audio and Speech Processing (eess.AS)
- [43] arXiv:1912.12023 [pdf, other]
-
Title: Monaural Speech Enhancement Using a Multi-Branch Temporal Convolutional NetworkComments: There are some inappropriate decriptions. These descriptions exist on many pagesSubjects: Audio and Speech Processing (eess.AS); Signal Processing (eess.SP)
- [44] arXiv:1912.12384 [pdf, other]
-
Title: Improved Multi-Stage Training of Online Attention-based Encoder-Decoder ModelsComments: Accepted and presented at the ASRU 2019 conferenceSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD); Signal Processing (eess.SP); Machine Learning (stat.ML)
- [45] arXiv:1912.13307 [pdf, ps, other]
-
Title: Attention-based gated scaling adaptative acoustic model for ctc-based speech recognitionComments: 5 pages,2 figures, submitted to ICASSP 2020Subjects: Audio and Speech Processing (eess.AS)
- [46] arXiv:1912.00087 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Effects of a Hovering Unmanned Aerial Vehicle on Urban Soundscapes PerceptionSubjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)
- [47] arXiv:1912.00364 (cross-list from eess.SP) [pdf, other]
-
Title: V-Shaped Sparse Arrays For 2-D DOA EstimationAuthors: Ahmet M. ElbirComments: Accepted paper in Circuits, Systems and Signal Processing, 2019Subjects: Signal Processing (eess.SP); Audio and Speech Processing (eess.AS)
- [48] arXiv:1912.01542 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Design of an algorithm for acoustic signal detection of moving vehiclesAuthors: Daniel Blasco AvellanedaComments: 5 pages, 5 figuresSubjects: Signal Processing (eess.SP); Sound (cs.SD); Audio and Speech Processing (eess.AS)
- [49] arXiv:1912.04357 (cross-list from eess.SP) [pdf, other]
-
Title: DeepMUSIC: Multiple Signal Classification via Deep LearningAuthors: Ahmet M. ElbirComments: To appear in IEEE Sensors Letters, 5 pages, 5 figuresSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
- [50] arXiv:1912.09428 (cross-list from eess.SP) [pdf, other]
-
Title: Location Forensics Analysis Using ENF Sequences Extracted from Power and Audio RecordingsComments: 5 pages, 5 figures, conference paperSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[ showing 50 entries per page: fewer | more | all ]
Disable MathJax (What is MathJax?)
Links to: arXiv, form interface, find, eess, 2404, contact, help (Access key information)