We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.CL

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Computation and Language

Title: Evaluating Explanations: How much do explanations from the teacher aid students?

Abstract: While many methods purport to explain predictions by highlighting salient features, what precise aims these explanations serve and how to evaluate their utility are often unstated. In this work, we formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning to simulate the teacher model on unseen examples for which explanations are unavailable. Student models incorporate explanations in training (but not prediction) procedures. Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions. Using our framework, we compare multiple attribution methods and observe consistent and quantitative differences amongst them across multiple learning strategies.
Comments: Preprint
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2012.00893 [cs.CL]
  (or arXiv:2012.00893v1 [cs.CL] for this version)

Submission history

From: Danish Pruthi [view email]
[v1] Tue, 1 Dec 2020 23:40:21 GMT (633kb,D)
[v2] Fri, 17 Dec 2021 04:50:55 GMT (494kb,D)

Link back to: arXiv, form interface, contact.