We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Computing AIC for black-box models using Generalised Degrees of Freedom: a comparison with cross-validation

Abstract: Generalised Degrees of Freedom (GDF), as defined by Ye (1998 JASA 93:120-131), represent the sensitivity of model fits to perturbations of the data. As such they can be computed for any statistical model, making it possible, in principle, to derive the number of parameters in machine-learning approaches. Defined originally for normally distributed data only, we here investigate the potential of this approach for Bernoulli-data. GDF-values for models of simulated and real data are compared to model complexity-estimates from cross-validation. Similarly, we computed GDF-based AICc for randomForest, neural networks and boosted regression trees and demonstrated its similarity to cross-validation. GDF-estimates for binary data were unstable and inconsistently sensitive to the number of data points perturbed simultaneously, while at the same time being extremely computer-intensive in their calculation. Repeated 10-fold cross-validation was more robust, based on fewer assumptions and faster to compute. Our findings suggest that the GDF-approach does not readily transfer to Bernoulli data and a wider range of regression approaches.
Comments: accompanying R-code on github
Subjects: Machine Learning (stat.ML)
Cite as: arXiv:1603.02743 [stat.ML]
  (or arXiv:1603.02743v1 [stat.ML] for this version)

Submission history

From: Carsten Dormann [view email]
[v1] Wed, 9 Mar 2016 00:01:18 GMT (329kb,D)

Link back to: arXiv, form interface, contact.