We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: Robust selection of predictors and conditional outlier detection in a perturbed large-dimensional regression context

Abstract: This paper presents a fast methodology, called ROBOUT, to identify outliers in a response variable conditional on a set of linearly related predictors, retrieved from a large granular dataset. ROBOUT is shown to be effective and particularly versatile compared to existing methods in the presence of a number of data idiosyncratic features. ROBOUT is able to identify observations with outlying conditional variance when the dataset contains element-wise sparse variables, and the set of predictors contains multivariate outliers. Existing integrated methodologies like SPARSE-LTS and RLARS are systematically sub-optimal under those conditions. ROBOUT entails a robust selection stage of the statistically relevant predictors (by using a Huber or a quantile loss), the estimation of a robust regression model based on the selected predictors (by LTS, GS or MM), and a criterion to identify conditional outliers based on a robust measure of the residuals' dispersion. We conduct a comprehensive simulation study in which the different variants of the proposed algorithm are tested under an exhaustive set of different perturbation scenarios. The methodology is also applied to a granular supervisory banking dataset collected by the European Central Bank.
Subjects: Methodology (stat.ME); Computation (stat.CO)
Cite as: arXiv:2104.12208 [stat.ME]
  (or arXiv:2104.12208v1 [stat.ME] for this version)

Submission history

From: Matteo Farnè Dr. [view email]
[v1] Sun, 25 Apr 2021 17:08:41 GMT (1652kb)

Link back to: arXiv, form interface, contact.