We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Measuring the Measuring Tools: An Automatic Evaluation of Semantic Metrics for Text Corpora

Abstract: The ability to compare the semantic similarity between text corpora is important in a variety of natural language processing applications. However, standard methods for evaluating these metrics have yet to be established. We propose a set of automatic and interpretable measures for assessing the characteristics of corpus-level semantic similarity metrics, allowing sensible comparison of their behavior. We demonstrate the effectiveness of our evaluation measures in capturing fundamental characteristics by evaluating them on a collection of classical and state-of-the-art metrics. Our measures revealed that recently-developed metrics are becoming better in identifying semantic distributional mismatch while classical metrics are more sensitive to perturbations in the surface text levels.
Comments: Published at GEM (this https URL) workshop at the Empirical Methods in Natural Language Processing (EMNLP) conference in 2022
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2211.16259 [cs.CL]
  (or arXiv:2211.16259v1 [cs.CL] for this version)

Submission history

From: Samuel Ackerman [view email]
[v1] Tue, 29 Nov 2022 14:47:07 GMT (17885kb,D)

Link back to: arXiv, form interface, contact.