We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Open Korean Corpora: A Practical Report

Abstract: Korean is often referred to as a low-resource language in the research community. While this claim is partially true, it is also because the availability of resources is inadequately advertised and curated. This work curates and reviews a list of Korean corpora, first describing institution-level resource development, then further iterate through a list of current open datasets for different types of tasks. We then propose a direction on how open-source dataset construction and releases should be done for less-resourced languages to promote research.
Comments: Published in NLP-OSS @EMNLP2020; May 2023 version added with new datasets
Subjects: Computation and Language (cs.CL)
DOI: 10.18653/v1/2020.nlposs-1.12
Cite as: arXiv:2012.15621 [cs.CL]
  (or arXiv:2012.15621v2 [cs.CL] for this version)

Submission history

From: Won Ik Cho [view email]
[v1] Thu, 31 Dec 2020 14:23:55 GMT (43kb,D)
[v2] Tue, 16 May 2023 17:08:24 GMT (61kb,D)

Link back to: arXiv, form interface, contact.