Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: A Framework to Learn with Interpretation
(Submitted on 19 Oct 2020 (v1), last revised 23 Feb 2022 (this version, v4))
Abstract: To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model. A detailed pipeline to visualize the learnt features is also developed. Moreover, besides generating interpretable models by design, our approach can be specialized to provide post-hoc interpretations for a pre-trained neural network. We validate our approach against several state-of-the-art methods on multiple datasets and show its efficacy on both kinds of tasks.
Submission history
From: Jayneel Parekh [view email][v1] Mon, 19 Oct 2020 09:26:28 GMT (5593kb,D)
[v2] Wed, 13 Jan 2021 18:44:17 GMT (5593kb,D)
[v3] Sun, 6 Jun 2021 14:21:35 GMT (16981kb,D)
[v4] Wed, 23 Feb 2022 13:29:44 GMT (22492kb,D)
Link back to: arXiv, form interface, contact.