We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Are UD Treebanks Getting More Consistent? A Report Card for English UD

Abstract: Recent efforts to consolidate guidelines and treebanks in the Universal Dependencies project raise the expectation that joint training and dataset comparison is increasingly possible for high-resource languages such as English, which have multiple corpora. Focusing on the two largest UD English treebanks, we examine progress in data consolidation and answer several questions: Are UD English treebanks becoming more internally consistent? Are they becoming more like each other and to what extent? Is joint training a good idea, and if so, since which UD version? Our results indicate that while consolidation has made progress, joint models may still suffer from inconsistencies, which hamper their ability to leverage a larger pool of training data.
Comments: Proceedings of the Sixth Workshop on Universal Dependencies (UDW 2023)
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2302.00636 [cs.CL]
  (or arXiv:2302.00636v1 [cs.CL] for this version)

Submission history

From: Amir Zeldes [view email]
[v1] Wed, 1 Feb 2023 17:58:28 GMT (5604kb,D)

Link back to: arXiv, form interface, contact.