Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Gao, Dongji; Wiesner, Matthew; Xu, Hainan; Garcia, Leibny Paola; Povey, Daniel; Khudanpur, Sanjeev

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2306

Computer Science > Computation and Language

Title: Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Authors: Dongji Gao, Matthew Wiesner, Hainan Xu, Leibny Paola Garcia, Daniel Povey, Sanjeev Khudanpur

(Submitted on 1 Jun 2023)

Abstract: This paper presents a novel algorithm for building an automatic speech recognition (ASR) model with imperfect training data. Imperfectly transcribed speech is a prevalent issue in human-annotated speech corpora, which degrades the performance of ASR models. To address this problem, we propose Bypass Temporal Classification (BTC) as an expansion of the Connectionist Temporal Classification (CTC) criterion. BTC explicitly encodes the uncertainties associated with transcripts during training. This is accomplished by enhancing the flexibility of the training graph, which is implemented as a weighted finite-state transducer (WFST) composition. The proposed algorithm improves the robustness and accuracy of ASR systems, particularly when working with imprecisely transcribed speech corpora. Our implementation will be open-sourced.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2306.01031 [cs.CL]
	(or arXiv:2306.01031v1 [cs.CL] for this version)

Submission history

From: Dongji Gao [view email]
[v1] Thu, 1 Jun 2023 14:56:19 GMT (1197kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2306.01031

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Bypass Temporal Classification: Weakly Supervised Automatic Speech Recognition with Imperfect Transcripts

Submission history