We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English

Abstract: We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic - English Speech Translation Corpus. This corpus is an extension of the ArzEn speech corpus, which was collected through informal interviews with bilingual speakers. In this work, we collect translations in both directions, monolingual Egyptian Arabic and monolingual English, forming a three-way speech translation corpus. We make the translation guidelines and corpus publicly available. We also report results for baseline systems for machine translation and speech translation tasks. We believe this is a valuable resource that can motivate and facilitate further research studying the code-switching phenomenon from a linguistic perspective and can be used to train and evaluate NLP systems.
Comments: Accepted to the Seventh Arabic Natural Language Processing Workshop (WANLP 2022)
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2211.12000 [cs.CL]
  (or arXiv:2211.12000v1 [cs.CL] for this version)

Submission history

From: Injy Hamed [view email]
[v1] Tue, 22 Nov 2022 04:37:14 GMT (2667kb,D)

Link back to: arXiv, form interface, contact.