Current browse context:
stat.ML
Change to browse by:
References & Citations
Statistics > Machine Learning
Title: Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies
(Submitted on 12 Jul 2020 (v1), last revised 6 Mar 2021 (this version, v4))
Abstract: Shapley values have become increasingly popular in the machine learning literature thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of `fairness'. The flexibility arises from the myriad potential forms of the Shapley value \textit{game formulation}. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a (non-parametric) measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a classical medical survey data set.
Submission history
From: Inga Strümke [view email][v1] Sun, 12 Jul 2020 15:04:59 GMT (712kb,D)
[v2] Thu, 16 Jul 2020 08:31:29 GMT (712kb,D)
[v3] Tue, 28 Jul 2020 09:11:04 GMT (712kb,D)
[v4] Sat, 6 Mar 2021 05:46:11 GMT (723kb,D)
Link back to: arXiv, form interface, contact.