Language Modeling with Reduced Densities

Bradley, Tai-Danae; Vlassopoulos, Yiannis

doi:10.32408/compositionality-3-4

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2007

Computer Science > Computation and Language

Title: Language Modeling with Reduced Densities

Authors: Tai-Danae Bradley, Yiannis Vlassopoulos

(Submitted on 8 Jul 2020 (v1), last revised 27 Nov 2021 (this version, v4))

Abstract: This work originates from the observation that today's state-of-the-art statistical language models are impressive not only for their performance, but also - and quite crucially - because they are built entirely from correlations in unstructured text data. The latter observation prompts a fundamental question that lies at the heart of this paper: What mathematical structure exists in unstructured text data? We put forth enriched category theory as a natural answer. We show that sequences of symbols from a finite alphabet, such as those found in a corpus of text, form a category enriched over probabilities. We then address a second fundamental question: How can this information be stored and modeled in a way that preserves the categorical structure? We answer this by constructing a functor from our enriched category of text to a particular enriched category of reduced density operators. The latter leverages the Loewner order on positive semidefinite operators, which can further be interpreted as a toy example of entailment.

Comments:	21 pages; v2: added reference; v3: revised abstract and introduction for clarity; v4: Compositionality version
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Category Theory (math.CT); Quantum Physics (quant-ph)
Journal reference:	Compositionality 3, 4 (2021)
DOI:	10.32408/compositionality-3-4
Cite as:	arXiv:2007.03834 [cs.CL]
	(or arXiv:2007.03834v4 [cs.CL] for this version)

Submission history

From: Tai-Danae Bradley [view email]
[v1] Wed, 8 Jul 2020 00:41:53 GMT (25kb)
[v2] Sat, 21 Nov 2020 00:24:05 GMT (25kb)
[v3] Wed, 30 Jun 2021 12:28:19 GMT (36kb)
[v4] Sat, 27 Nov 2021 15:41:31 GMT (74kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2007.03834

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Language Modeling with Reduced Densities

Submission history