LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

Nachmani, Eliya; Levkovitch, Alon; Salazar, Julian; Asawaroengchai, Chulayuth; Mariooryad, Soroosh; Skerry-Ryan, RJ; Ramanovich, Michelle Tadmor

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2305

Computer Science > Computation and Language

Title: LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

Authors: Eliya Nachmani, Alon Levkovitch, Julian Salazar, Chulayuth Asawaroengchai, Soroosh Mariooryad, RJ Skerry-Ryan, Michelle Tadmor Ramanovich

(Submitted on 24 May 2023 (v1), revised 1 Jun 2023 (this version, v2), latest version 20 Oct 2023 (v3))

Abstract: We present SPECTRON, a novel approach to adapting pre-trained language models (LMs) to perform speech continuation. By leveraging pre-trained speech encoders, our model generates both text and speech outputs with the entire system being trained end-to-end operating directly on spectrograms. Training the entire model in the spectrogram domain simplifies our speech continuation system versus existing cascade methods which use discrete speech representations. We further show our method surpasses existing spoken language models both in semantic content and speaker preservation while also benefiting from the knowledge transferred from pre-existing models. Audio samples can be found in our website this https URL

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2305.15255 [cs.CL]
	(or arXiv:2305.15255v2 [cs.CL] for this version)

Submission history

From: Eliya Nachmani [view email]
[v1] Wed, 24 May 2023 15:39:43 GMT (184kb,D)
[v2] Thu, 1 Jun 2023 08:04:19 GMT (184kb,D)
[v3] Fri, 20 Oct 2023 05:55:39 GMT (225kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.15255v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

Submission history