We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: E2E Segmentation in a Two-Pass Cascaded Encoder ASR Model

Abstract: We explore unifying a neural segmenter with two-pass cascaded encoder ASR into a single model. A key challenge is allowing the segmenter (which runs in real-time, synchronously with the decoder) to finalize the 2nd pass (which runs 900 ms behind real-time) without introducing user-perceived latency or deletion errors during inference. We propose a design where the neural segmenter is integrated with the causal 1st pass decoder to emit a end-of-segment (EOS) signal in real-time. The EOS signal is then used to finalize the non-causal 2nd pass. We experiment with different ways to finalize the 2nd pass, and find that a novel dummy frame injection strategy allows for simultaneous high quality 2nd pass results and low finalization latency. On a real-world long-form captioning task (YouTube), we achieve 2.4% relative WER and 140 ms EOS latency gains over a baseline VAD-based segmenter with the same cascaded encoder.
Comments: ICASSP 2023
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2211.15432 [cs.CL]
  (or arXiv:2211.15432v2 [cs.CL] for this version)

Submission history

From: Wenqian Ronny Huang [view email]
[v1] Mon, 28 Nov 2022 15:18:07 GMT (289kb,D)
[v2] Sun, 5 Mar 2023 19:21:25 GMT (289kb,D)

Link back to: arXiv, form interface, contact.