We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ME

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Methodology

Title: A general framework for formulating structured variable selection

Abstract: In variable selection, a selection rule that prescribes the permissible sets of selected variables (called a "selection dictionary") is desirable due to the inherent structural constraints among the candidate variables. Such selection rules can be complex in real-world data analyses, and failing to incorporate such restrictions could not only compromise the interpretability of the model but also lead to decreased prediction accuracy. However, no general framework has been proposed to formalize selection rules and their applications, which poses a significant challenge for practitioners seeking to integrate these rules into their analyses. In this work, we establish a framework for structured variable selection that can incorporate universal structural constraints. We develop a mathematical language for constructing arbitrary selection rules, where the selection dictionary is formally defined. We demonstrate that all selection rules can be expressed as combinations of operations on constructs, facilitating the identification of the corresponding selection dictionary. Once this selection dictionary is derived, practitioners can apply their own user-defined criteria to select the optimal model. Additionally, our framework enhances existing penalized regression methods for variable selection by providing guidance on how to appropriately group variables to achieve the desired selection rule. Furthermore, our innovative framework opens the door to establishing new l0 norm-based penalized regression techniques that can be tailored to respect arbitrary selection rules, thereby expanding the possibilities for more robust and tailored model development.
Comments: 14 pages
Subjects: Methodology (stat.ME)
Journal reference: Transactions on Machine Learning Research (2024/01)
Cite as: arXiv:2110.01031 [stat.ME]
  (or arXiv:2110.01031v4 [stat.ME] for this version)

Submission history

From: Guanbo Wang [view email]
[v1] Sun, 3 Oct 2021 15:52:22 GMT (37kb)
[v2] Thu, 14 Apr 2022 19:17:26 GMT (44kb)
[v3] Tue, 24 Oct 2023 19:34:24 GMT (47kb)
[v4] Mon, 15 Jan 2024 22:09:16 GMT (53kb)

Link back to: arXiv, form interface, contact.