Bridging the Modality Gap for Speech-to-Text Translation

Liu, Yuchen; Zhu, Junnan; Zhang, Jiajun; Zong, Chengqing

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2010

Change to browse by:

Computer Science > Computation and Language

Title: Bridging the Modality Gap for Speech-to-Text Translation

Authors: Yuchen Liu, Junnan Zhu, Jiajun Zhang, Chengqing Zong

(Submitted on 28 Oct 2020)

Abstract: End-to-end speech translation aims to translate speech in one language into text in another language via an end-to-end way. Most existing methods employ an encoder-decoder structure with a single encoder to learn acoustic representation and semantic information simultaneously, which ignores the speech-and-text modality differences and makes the encoder overloaded, leading to great difficulty in learning such a model. To address these issues, we propose a Speech-to-Text Adaptation for Speech Translation (STAST) model which aims to improve the end-to-end model performance by bridging the modality gap between speech and text. Specifically, we decouple the speech translation encoder into three parts and introduce a shrink mechanism to match the length of speech representation with that of the corresponding text transcription. To obtain better semantic representation, we completely integrate a text-based translation model into the STAST so that two tasks can be trained in the same latent space. Furthermore, we introduce a cross-modal adaptation method to close the distance between speech and text representation. Experimental results on English-French and English-German speech translation corpora have shown that our model significantly outperforms strong baselines, and achieves the new state-of-the-art performance.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2010.14920 [cs.CL]
	(or arXiv:2010.14920v1 [cs.CL] for this version)

Submission history

From: Yuchen Liu [view email]
[v1] Wed, 28 Oct 2020 12:33:04 GMT (7642kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.14920

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Bridging the Modality Gap for Speech-to-Text Translation

Submission history