Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Evaluations and Methods for Explanation through Robustness Analysis
(Submitted on 31 May 2020 (this version), latest version 8 Apr 2021 (v2))
Abstract: Among multiple ways of interpreting a machine learning model, measuring the importance of a set of features tied to a prediction is probably one of the most intuitive ways to explain a model. In this paper, we establish the link between a set of features to a prediction with a new evaluation criterion, robustness analysis, which measures the minimum distortion distance of adversarial perturbation. By measuring the tolerance level for an adversarial attack, we can extract a set of features that provides the most robust support for a prediction, and also can extract a set of features that contrasts the current prediction to a target class by setting a targeted adversarial attack. By applying this methodology to various prediction tasks across multiple domains, we observe the derived explanations are indeed capturing the significant feature set qualitatively and quantitatively.
Submission history
From: Cheng-Yu Hsieh [view email][v1] Sun, 31 May 2020 05:52:05 GMT (7070kb,D)
[v2] Thu, 8 Apr 2021 21:18:01 GMT (8647kb,D)
Link back to: arXiv, form interface, contact.