Evaluating Machine Common Sense via Cloze Testing

Qasemi, Ehsan; Kezar, Lee; Pujara, Jay; Szekely, Pedro

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2201

Change to browse by:

Computer Science > Computation and Language

Title: Evaluating Machine Common Sense via Cloze Testing

Authors: Ehsan Qasemi, Lee Kezar, Jay Pujara, Pedro Szekely

(Submitted on 19 Jan 2022)

Abstract: Language models (LMs) show state of the art performance for common sense (CS) question answering, but whether this ability implies a human-level mastery of CS remains an open question. Understanding the limitations and strengths of LMs can help researchers improve these models, potentially by developing novel ways of integrating external CS knowledge. We devise a series of tests and measurements to systematically quantify their performance on different aspects of CS. We propose the use of cloze testing combined with word embeddings to measure the LM's robustness and confidence. Our results show than although language models tend to achieve human-like accuracy, their confidence is subpar. Future work can leverage this information to build more complex systems, such as an ensemble of symbolic and distributed knowledge.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2201.07902 [cs.CL]
	(or arXiv:2201.07902v1 [cs.CL] for this version)

Submission history

From: Lee Kezar [view email]
[v1] Wed, 19 Jan 2022 23:00:41 GMT (437kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2201.07902

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Evaluating Machine Common Sense via Cloze Testing

Submission history