We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:


References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Computation and Language

Title: Topic Modeling in the Voynich Manuscript

Abstract: This article presents the results of investigations using topic modeling of the Voynich Manuscript (Beinecke MS408). Topic modeling is a set of computational methods which are used to identify clusters of subjects within text. We use latent dirichlet allocation, latent semantic analysis, and nonnegative matrix factorization to cluster Voynich pages into `topics'. We then compare the topics derived from the computational models to clusters derived from the Voynich illustrations and from paleographic analysis. We find that computationally derived clusters match closely to a conjunction of scribe and subject matter (as per the illustrations), providing further evidence that the Voynich Manuscript contains meaningful text.
Comments: See this https URL for a version that has the Voynich font (and better figure placement), since arxiv does not allow xelatex compilation
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2107.02858 [cs.CL]
  (or arXiv:2107.02858v1 [cs.CL] for this version)

Submission history

From: Claire Bowern [view email]
[v1] Tue, 6 Jul 2021 19:50:03 GMT (10136kb,D)

Link back to: arXiv, form interface, contact.