We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Clustering Contextualized Representations of Text for Unsupervised Syntax Induction

Abstract: We explore clustering of contextualized text representations for two unsupervised syntax induction tasks: part of speech induction (POSI) and constituency labelling (CoLab). We propose a deep embedded clustering approach which jointly transforms these representations into a lower dimension cluster friendly space and clusters them. We further enhance these representations by augmenting them with task-specific representations. We also explore the effectiveness of multilingual representations for different tasks and languages. With this work, we establish the first strong baselines for unsupervised syntax induction using contextualized text representations. We report competitive performance on 45-tag POSI, state-of-the-art performance on 12-tag POSI across 10 languages, and competitive results on CoLab.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2010.12784 [cs.CL]
  (or arXiv:2010.12784v1 [cs.CL] for this version)

Submission history

From: Vikram Gupta [view email]
[v1] Sat, 24 Oct 2020 05:06:29 GMT (1652kb,D)

Link back to: arXiv, form interface, contact.