We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Language Modeling with Reduced Densities

Abstract: This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also - and quite crucially - because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment.
Comments: 21 pages; v2: added reference; v3: revised abstract and introduction for clarity; v4: Compositionality version
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Category Theory (math.CT); Quantum Physics (quant-ph)
Journal reference: Compositionality 3, 4 (2021)
DOI: 10.32408/compositionality-3-4
Cite as: arXiv:2007.03834 [cs.CL]
  (or arXiv:2007.03834v4 [cs.CL] for this version)

Submission history

From: Tai-Danae Bradley [view email]
[v1] Wed, 8 Jul 2020 00:41:53 GMT (25kb)
[v2] Sat, 21 Nov 2020 00:24:05 GMT (25kb)
[v3] Wed, 30 Jun 2021 12:28:19 GMT (36kb)
[v4] Sat, 27 Nov 2021 15:41:31 GMT (74kb,D)

Link back to: arXiv, form interface, contact.