References & Citations
Computer Science > Computation and Language
Title: Evaluating Explanations: How much do explanations from the teacher aid students?
(Submitted on 1 Dec 2020 (this version), latest version 17 Dec 2021 (v2))
Abstract: While many methods purport to explain predictions by highlighting salient features, what precise aims these explanations serve and how to evaluate their utility are often unstated. In this work, we formalize the value of explanations using a student-teacher paradigm that measures the extent to which explanations improve student models in learning to simulate the teacher model on unseen examples for which explanations are unavailable. Student models incorporate explanations in training (but not prediction) procedures. Unlike many prior proposals to evaluate explanations, our approach cannot be easily gamed, enabling principled, scalable, and automatic evaluation of attributions. Using our framework, we compare multiple attribution methods and observe consistent and quantitative differences amongst them across multiple learning strategies.
Submission history
From: Danish Pruthi [view email][v1] Tue, 1 Dec 2020 23:40:21 GMT (633kb,D)
[v2] Fri, 17 Dec 2021 04:50:55 GMT (494kb,D)
Link back to: arXiv, form interface, contact.