References & Citations
Statistics > Methodology
Title: Sparse Variable Selection on High Dimensional Heterogeneous Data with Tree Structured Responses
(Submitted on 22 Nov 2017 (v1), last revised 19 Apr 2024 (this version, v2))
Abstract: We consider the problem of sparse variable selection on high dimension heterogeneous data sets, which has been taking on renewed interest recently due to the growth of biological and medical data sets with complex, non-i.i.d. structures and huge quantities of response variables. The heterogeneity is likely to confound the association between explanatory variables and responses, resulting in enormous false discoveries when Lasso or its variants are na\"ively applied. Therefore, developing effective confounder correction methods is a growing heat point among researchers. However, ordinarily employing recent confounder correction methods will result in undesirable performance due to the ignorance of the convoluted interdependency among response variables. To fully improve current variable selection methods, we introduce a model, the tree-guided sparse linear mixed model, that can utilize the dependency information from multiple responses to explore how specifically clusters are and select the active variables from heterogeneous data. Through extensive experiments on synthetic and real data sets, we show that our proposed model outperforms the existing methods and achieves the highest ROC area.
Submission history
From: Xiang Liu [view email][v1] Wed, 22 Nov 2017 13:18:37 GMT (6300kb,D)
[v2] Fri, 19 Apr 2024 03:31:28 GMT (10398kb,D)
Link back to: arXiv, form interface, contact.