We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Explaining the data or explaining a model? Shapley values that uncover non-linear dependencies

Abstract: Shapley values have become increasingly popular in the machine learning literature thanks to their attractive axiomatisation, flexibility, and uniqueness in satisfying certain notions of `fairness'. The flexibility arises from the myriad potential forms of the Shapley value \textit{game formulation}. Amongst the consequences of this flexibility is that there are now many types of Shapley values being discussed, with such variety being a source of potential misunderstanding. To the best of our knowledge, all existing game formulations in the machine learning and statistics literature fall into a category which we name the model-dependent category of game formulations. In this work, we consider an alternative and novel formulation which leads to the first instance of what we call model-independent Shapley values. These Shapley values use a (non-parametric) measure of non-linear dependence as the characteristic function. The strength of these Shapley values is in their ability to uncover and attribute non-linear dependencies amongst features. We introduce and demonstrate the use of the energy distance correlations, affine-invariant distance correlation, and Hilbert-Shmidt independence criterion as Shapley value characteristic functions. In particular, we demonstrate their potential value for exploratory data analysis and model diagnostics. We conclude with an interesting expository application to a classical medical survey data set.
Comments: 26 pages, 7 figures, 2 tables
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:2007.06011 [stat.ML]
  (or arXiv:2007.06011v4 [stat.ML] for this version)

Submission history

From: Inga Strümke [view email]
[v1] Sun, 12 Jul 2020 15:04:59 GMT (712kb,D)
[v2] Thu, 16 Jul 2020 08:31:29 GMT (712kb,D)
[v3] Tue, 28 Jul 2020 09:11:04 GMT (712kb,D)
[v4] Sat, 6 Mar 2021 05:46:11 GMT (723kb,D)

Link back to: arXiv, form interface, contact.