On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Gat, Itai; Kreuk, Felix; Lee, Ann; Copet, Jade; Synnaeve, Gabriel; Dupoux, Emmanuel; Adi, Yossi

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2209

Computer Science > Computation and Language

Title: On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Authors: Itai Gat, Felix Kreuk, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi

(Submitted on 30 Sep 2022 (this version), latest version 29 May 2023 (v2))

Abstract: Self-supervised representations have been extensively studied for discriminative and generative tasks. However, their robustness capabilities have not been extensively investigated. This work focuses on self-supervised representations for spoken generative language models. First, we empirically demonstrate how current state-of-the-art speech representation models lack robustness to basic signal variations that do not alter the spoken information. To overcome this, we propose an effective and efficient method to learn robust self-supervised speech representation for generative spoken language modeling. The proposed approach is based on applying a set of signal transformations to the speech signal and optimizing the model using an iterative pseudo-labeling scheme. Our method significantly improves over the evaluated baselines when considering encoding metrics. We additionally evaluate our method on the speech-to-speech translation task. We consider Spanish-English and French-English conversions and empirically demonstrate the benefits of following the proposed approach.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2209.15483 [cs.CL]
	(or arXiv:2209.15483v1 [cs.CL] for this version)

Submission history

From: Itai Gat [view email]
[v1] Fri, 30 Sep 2022 14:15:03 GMT (135kb,D)
[v2] Mon, 29 May 2023 10:50:29 GMT (123kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2209.15483v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: On The Robustness of Self-Supervised Representations for Spoken Language Modeling

Submission history