SpecAugment on Large Scale Datasets

Park, Daniel S.; Zhang, Yu; Chiu, Chung-Cheng; Chen, Youzheng; Li, Bo; Chan, William; Le, Quoc V.; Wu, Yonghui

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 1912

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: SpecAugment on Large Scale Datasets

Authors: Daniel S. Park, Yu Zhang, Chung-Cheng Chiu, Youzheng Chen, Bo Li, William Chan, Quoc V. Le, Yonghui Wu

(Submitted on 11 Dec 2019)

Abstract: Recently, SpecAugment, an augmentation scheme for automatic speech recognition that acts directly on the spectrogram of input utterances, has shown to be highly effective in enhancing the performance of end-to-end networks on public datasets. In this paper, we demonstrate its effectiveness on tasks with large scale datasets by investigating its application to the Google Multidomain Dataset (Narayanan et al., 2018). We achieve improvement across all test domains by mixing raw training data augmented with SpecAugment and noise-perturbed training data when training the acoustic model. We also introduce a modification of SpecAugment that adapts the time mask size and/or multiplicity depending on the length of the utterance, which can potentially benefit large scale tasks. By using adaptive masking, we are able to further improve the performance of the Listen, Attend and Spell model on LibriSpeech to 2.2% WER on test-clean and 5.2% WER on test-other.

Comments:	5 pages, 3 tables; submitted to ICASSP 2020
Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
Cite as:	arXiv:1912.05533 [eess.AS]
	(or arXiv:1912.05533v1 [eess.AS] for this version)

Submission history

From: Daniel Park [view email]
[v1] Wed, 11 Dec 2019 18:58:58 GMT (16kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:1912.05533

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: SpecAugment on Large Scale Datasets

Submission history