The xyz algorithm for fast interaction search in high-dimensional data

Thanei, Gian-Andrea; Meinshausen, Nicolai; Shah, Rajen D.

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1610

Statistics > Machine Learning

Title: The xyz algorithm for fast interaction search in high-dimensional data

Authors: Gian-Andrea Thanei, Nicolai Meinshausen, Rajen D. Shah

(Submitted on 17 Oct 2016 (v1), last revised 17 Sep 2018 (this version, v4))

Abstract: When performing regression on a dataset with $p$ variables, it is often of interest to go beyond using main linear effects and include interactions as products between individual variables. For small-scale problems, these interactions can be computed explicitly but this leads to a computational complexity of at least $\mathcal{O}(p^2)$ if done naively. This cost can be prohibitive if $p$ is very large. We introduce a new randomised algorithm that is able to discover interactions with high probability and under mild conditions has a runtime that is subquadratic in $p$. We show that strong interactions can be discovered in almost linear time, whilst finding weaker interactions requires $\mathcal{O}(p^\alpha)$ operations for $1 < \alpha < 2$ depending on their strength. The underlying idea is to transform interaction search into a closestpair problem which can be solved efficiently in subquadratic time. The algorithm is called $\mathit{xyz}$ and is implemented in the language R. We demonstrate its efficiency for application to genome-wide association studies, where more than $10^{11}$ interactions can be screened in under $280$ seconds with a single-core $1.2$ GHz CPU.

Subjects:	Machine Learning (stat.ML); Computation (stat.CO)
MSC classes:	62-04
Journal reference:	JMLR. Journal of Machine Learning Research. The xyz algorithm for fast interaction search in high-dimensional data. Gian-Andrea Thanei, Nicolai Meinshausen, Rajen D. Shah. 19.37.1.42. 2018
Cite as:	arXiv:1610.05108 [stat.ML]
	(or arXiv:1610.05108v4 [stat.ML] for this version)

Submission history

From: Gian-Andrea Thanei [view email]
[v1] Mon, 17 Oct 2016 13:42:22 GMT (403kb,D)
[v2] Thu, 20 Oct 2016 06:29:04 GMT (1393kb,D)
[v3] Sun, 17 Dec 2017 23:02:36 GMT (414kb,D)
[v4] Mon, 17 Sep 2018 20:25:04 GMT (413kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1610.05108

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: The xyz algorithm for fast interaction search in high-dimensional data

Submission history