We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.AT

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Algebraic Topology

Title: Topological Data Analysis on Simple English Wikipedia Articles

Abstract: Single-parameter persistent homology, a key tool in topological data analysis, has been widely applied to data problems along with statistical techniques that quantify the significance of the results. In contrast, statistical techniques for two-parameter persistence, while highly desirable for real-world applications, have scarcely been considered. We present three statistical approaches for comparing geometric data using two-parameter persistent homology; these approaches rely on the Hilbert function, matching distance, and barcodes obtained from two-parameter persistence modules computed from the point-cloud data. Our statistical methods are broadly applicable for analysis of geometric data indexed by a real-valued parameter. We apply these approaches to analyze high-dimensional point-cloud data obtained from Simple English Wikipedia articles. In particular, we show how our methods can be utilized to distinguish certain subsets of the Wikipedia data and to compare with random data. These results yield insights into the construction of null distributions and stability of our methods with respect to noisy data.
Comments: 17 pages, 13 figures
Subjects: Algebraic Topology (math.AT)
MSC classes: 55N31, 62R40
Cite as: arXiv:2007.00063 [math.AT]
  (or arXiv:2007.00063v2 [math.AT] for this version)

Submission history

From: Matthew Wright [view email]
[v1] Tue, 30 Jun 2020 18:54:16 GMT (611kb,D)
[v2] Fri, 11 Dec 2020 18:27:28 GMT (612kb,D)

Link back to: arXiv, form interface, contact.