We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

eess.AS

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Abstract: Speaker adaptation is important to build robust automatic speech recognition (ASR) systems. In this work, we investigate various methods for speaker adaptive training (SAT) based on feature-space approaches for a conformer-based acoustic model (AM) on the Switchboard 300h dataset. We propose a method, called Weighted-Simple-Add, which adds weighted speaker information vectors to the input of the multi-head self-attention module of the conformer AM. Using this method for SAT, we achieve 3.5% and 4.5% relative improvement in terms of WER on the CallHome part of Hub5'00 and Hub5'01 respectively. Moreover, we build on top of our previous work where we proposed a novel and competitive training recipe for a conformer-based hybrid AM. We extend and improve this recipe where we achieve 11% relative improvement in terms of word-error-rate (WER) on Switchboard 300h Hub5'00 dataset. We also make this recipe efficient by reducing the total number of parameters by 34% relative.
Comments: Accepted at INTERSPEECH 2022
Subjects: Computation and Language (cs.CL); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
Cite as: arXiv:2206.12955 [cs.CL]
  (or arXiv:2206.12955v1 [cs.CL] for this version)

Submission history

From: Mohammad Zeineldeen [view email]
[v1] Sun, 26 Jun 2022 20:01:08 GMT (97kb,D)

Link back to: arXiv, form interface, contact.