We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: A Topological Method for Comparing Document Semantics

Abstract: Comparing document semantics is one of the toughest tasks in both Natural Language Processing and Information Retrieval. To date, on one hand, the tools for this task are still rare. On the other hand, most relevant methods are devised from the statistic or the vector space model perspectives but nearly none from a topological perspective. In this paper, we hope to make a different sound. A novel algorithm based on topological persistence for comparing semantics similarity between two documents is proposed. Our experiments are conducted on a document dataset with human judges' results. A collection of state-of-the-art methods are selected for comparison. The experimental results show that our algorithm can produce highly human-consistent results, and also beats most state-of-the-art methods though ties with NLTK.
Comments: 9 pages, 3 tables, 9th International Conference on Natural Language Processing (NLP 2020)
Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Journal reference: pp. 143-151, 2020. CS & IT - CSCP 2020
DOI: 10.5121/csit.2020.101411
Cite as: arXiv:2012.04203 [cs.CL]
  (or arXiv:2012.04203v1 [cs.CL] for this version)

Submission history

From: Yuqi Kong [view email]
[v1] Tue, 8 Dec 2020 04:21:40 GMT (494kb)

Link back to: arXiv, form interface, contact.