Current browse context:
stat.ME
Change to browse by:
References & Citations
Statistics > Methodology
Title: Cross-Validation for Correlated Data
(Submitted on 4 Apr 2019 (v1), last revised 24 Apr 2020 (this version, v3))
Abstract: K-fold cross-validation (CV) with squared error loss is widely used for evaluating predictive models, especially when strong distributional assumptions cannot be taken. However, CV with squared error loss is not free from distributional assumptions, in particular in cases involving non-i.i.d. data. This paper analyzes CV for correlated data. We present a criterion for suitability of standard CV in presence of correlations. When this criterion does not hold, we introduce a bias corrected cross-validation estimator which we term $CV_c,$ that yields an unbiased estimate of prediction error in many settings where standard CV is invalid. We also demonstrate our results numerically, and find that introducing our correction substantially improves both, model evaluation and model selection in simulations and real data studies.
Submission history
From: Assaf Rabinowicz [view email][v1] Thu, 4 Apr 2019 09:58:35 GMT (99kb,D)
[v2] Sun, 12 May 2019 10:26:05 GMT (50kb,D)
[v3] Fri, 24 Apr 2020 15:38:35 GMT (263kb,D)
Link back to: arXiv, form interface, contact.