Word Order Does Not Matter For Speech Recognition

Pratap, Vineel; Xu, Qiantong; Likhomanenko, Tatiana; Synnaeve, Gabriel; Collobert, Ronan

Full-text links:

Download:

Current browse context:

eess.AS

< prev | next >

new | recent | 2110

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Word Order Does Not Matter For Speech Recognition

Authors: Vineel Pratap, Qiantong Xu, Tatiana Likhomanenko, Gabriel Synnaeve, Ronan Collobert

(Submitted on 12 Oct 2021 (v1), last revised 18 Oct 2021 (this version, v2))

Abstract: In this paper, we study training of automatic speech recognition system in a weakly supervised setting where the order of words in transcript labels of the audio training data is not known. We train a word-level acoustic model which aggregates the distribution of all output frames using LogSumExp operation and uses a cross-entropy loss to match with the ground-truth words distribution. Using the pseudo-labels generated from this model on the training set, we then train a letter-based acoustic model using Connectionist Temporal Classification loss. Our system achieves 2.3%/4.6% on test-clean/test-other subsets of LibriSpeech, which closely matches with the supervised baseline's performance.

Subjects:	Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
Cite as:	arXiv:2110.05994 [eess.AS]
	(or arXiv:2110.05994v2 [eess.AS] for this version)

Submission history

From: Vineel Pratap [view email]
[v1] Tue, 12 Oct 2021 13:35:01 GMT (2435kb,D)
[v2] Mon, 18 Oct 2021 19:04:13 GMT (2415kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2110.05994

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Audio and Speech Processing

Title: Word Order Does Not Matter For Speech Recognition

Submission history