We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data

Abstract: How to boost speech pre-training with textual data is an unsolved problem due to the fact that speech and text are very different modalities with distinct characteristics. In this paper, we propose a cross-modal Speech and Language Model (SpeechLM) to explicitly align speech and text pre-training with a pre-defined unified discrete representation. Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities, including phoneme-unit and hidden-unit tokenizers, which can be trained using a small amount of paired speech-text data. Based on the trained tokenizers, we convert the unlabeled speech and text data into tokens of phoneme units or hidden units. The pre-training objective is designed to unify the speech and the text into the same discrete semantic space with a unified Transformer network. We evaluate SpeechLM on various spoken language processing tasks including speech recognition, speech translation, and universal representation evaluation framework SUPERB, demonstrating significant improvements on content-related tasks. Code and models are available at this https URL
Comments: We have corrected the errors in the pre-training data for SpeechLM-P Base models, new results are updated
Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Audio and Speech Processing (eess.AS)
Cite as: arXiv:2209.15329 [cs.CL]
  (or arXiv:2209.15329v3 [cs.CL] for this version)

Submission history

From: Ziqiang Zhang [view email]
[v1] Fri, 30 Sep 2022 09:12:10 GMT (1330kb,D)
[v2] Fri, 28 Apr 2023 02:28:01 GMT (0kb,I)
[v3] Thu, 15 Jun 2023 14:43:48 GMT (9211kb,D)

Link back to: arXiv, form interface, contact.