We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Evaluating Machine Common Sense via Cloze Testing

Abstract: Language models (LMs) show state of the art performance for common sense (CS) question answering, but whether this ability implies a human-level mastery of CS remains an open question. Understanding the limitations and strengths of LMs can help researchers improve these models, potentially by developing novel ways of integrating external CS knowledge. We devise a series of tests and measurements to systematically quantify their performance on different aspects of CS. We propose the use of cloze testing combined with word embeddings to measure the LM's robustness and confidence. Our results show than although language models tend to achieve human-like accuracy, their confidence is subpar. Future work can leverage this information to build more complex systems, such as an ensemble of symbolic and distributed knowledge.
Subjects: Computation and Language (cs.CL)
Cite as: arXiv:2201.07902 [cs.CL]
  (or arXiv:2201.07902v1 [cs.CL] for this version)

Submission history

From: Lee Kezar [view email]
[v1] Wed, 19 Jan 2022 23:00:41 GMT (437kb,D)

Link back to: arXiv, form interface, contact.