We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Methodology

Title: A Two-Stage Variable Selection Approach for Correlated High Dimensional Predictors

Authors: Zhiyuan Li
Abstract: When fitting statistical models, some predictors are often found to be correlated with each other, and functioning together. Many group variable selection methods are developed to select the groups of predictors that are closely related to the continuous or categorical response. These existing methods usually assume the group structures are well known. For example, variables with similar practical meaning, or dummy variables created by categorical data. However, in practice, it is impractical to know the exact group structure, especially when the variable dimensional is large. As a result, the group variable selection results may be selected. To solve the challenge, we propose a two-stage approach that combines a variable clustering stage and a group variable stage for the group variable selection problem. The variable clustering stage uses information from the data to find a group structure, which improves the performance of the existing group variable selection methods. For ultrahigh dimensional data, where the predictors are much larger than observations, we incorporated a variable screening method in the first stage and shows the advantages of such an approach. In this article, we compared and discussed the performance of four existing group variable selection methods under different simulation models, with and without the variable clustering stage. The two-stage method shows a better performance, in terms of the prediction accuracy, as well as in the accuracy to select active predictors. An athlete's data is also used to show the advantages of the proposed method.
Comments: 30 pages, 7 figures
Subjects: Methodology (stat.ME); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)
Cite as: arXiv:2103.13357 [stat.ME]
  (or arXiv:2103.13357v1 [stat.ME] for this version)

Submission history

From: Zhiyuan Li [view email]
[v1] Wed, 24 Mar 2021 17:28:34 GMT (1314kb,D)

Link back to: arXiv, form interface, contact.