We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Abstract: Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories' internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.
Comments: Accepted to appear in TACL; Authors' final version, pre-MIT Press publication
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2012.01285 [cs.CL]
  (or arXiv:2012.01285v2 [cs.CL] for this version)

Submission history

From: Jakob Prange [view email]
[v1] Wed, 2 Dec 2020 15:51:36 GMT (1618kb,D)
[v2] Fri, 11 Dec 2020 15:10:25 GMT (1677kb,D)

Link back to: arXiv, form interface, contact.