We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts

Abstract: We introduce NaSGEC, a new dataset to facilitate research on Chinese grammatical error correction (CGEC) for native speaker texts from multiple domains. Previous CGEC research primarily focuses on correcting texts from a single domain, especially learner essays. To broaden the target domain, we annotate multiple references for 12,500 sentences from three native domains, i.e., social media, scientific writing, and examination. We provide solid benchmark results for NaSGEC by employing cutting-edge CGEC models and different training data. We further perform detailed analyses of the connections and gaps between our domains from both empirical and statistical views. We hope this work can inspire future studies on an important but under-explored direction--cross-domain GEC.
Comments: Accepted by ACL 2023 (Findings, long paper)
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2305.16023 [cs.CL]
  (or arXiv:2305.16023v1 [cs.CL] for this version)

Submission history

From: Yue Zhang [view email]
[v1] Thu, 25 May 2023 13:05:52 GMT (791kb,D)

Link back to: arXiv, form interface, contact.