We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: Cross-Validation for Correlated Data

Abstract: K-fold cross-validation (CV) with squared error loss is widely used for evaluating predictive models, especially when strong distributional assumptions cannot be taken. However, CV with squared error loss is not free from distributional assumptions, in particular in cases involving non-i.i.d. data. This paper analyzes CV for correlated data. We present a criterion for suitability of standard CV in presence of correlations. When this criterion does not hold, we introduce a bias corrected cross-validation estimator which we term $CV_c,$ that yields an unbiased estimate of prediction error in many settings where standard CV is invalid. We also demonstrate our results numerically, and find that introducing our correction substantially improves both, model evaluation and model selection in simulations and real data studies.
Subjects: Methodology (stat.ME); Machine Learning (stat.ML)
Cite as: arXiv:1904.02438 [stat.ME]
  (or arXiv:1904.02438v3 [stat.ME] for this version)

Submission history

From: Assaf Rabinowicz [view email]
[v1] Thu, 4 Apr 2019 09:58:35 GMT (99kb,D)
[v2] Sun, 12 May 2019 10:26:05 GMT (50kb,D)
[v3] Fri, 24 Apr 2020 15:38:35 GMT (263kb,D)

Link back to: arXiv, form interface, contact.