Current browse context:
math.ST
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices
(Submitted on 23 Dec 2019 (v1), last revised 24 Feb 2022 (this version, v3))
Abstract: We consider random-design linear prediction and related questions on the lower tail of random matrices. It is known that, under boundedness constraints, the minimax risk is of order $d/n$ in dimension $d$ with $n$ samples. Here, we study the minimax expected excess risk over the full linear class, depending on the distribution of covariates. First, the least squares estimator is exactly minimax optimal in the well-specified case, for every distribution of covariates. We express the minimax risk in terms of the distribution of statistical leverage scores of individual samples, and deduce a minimax lower bound of $d/(n-d+1)$ for any covariate distribution, nearly matching the risk for Gaussian design. We then obtain sharp nonasymptotic upper bounds for covariates that satisfy a "small ball"-type regularity condition in both well-specified and misspecified cases. Our main technical contribution is the study of the lower tail of the smallest singular value of empirical covariance matrices at small values. We establish a lower bound on this lower tail, valid for any distribution in dimension $d \geq 2$, together with a matching upper bound under a necessary regularity condition. Our proof relies on the PAC-Bayes technique for controlling empirical processes, and extends an analysis of Oliveira devoted to a different part of the lower tail.
Submission history
From: Jaouad Mourtada [view email][v1] Mon, 23 Dec 2019 12:08:09 GMT (53kb)
[v2] Fri, 27 Mar 2020 16:46:12 GMT (54kb)
[v3] Thu, 24 Feb 2022 16:09:36 GMT (45kb)
Link back to: arXiv, form interface, contact.