Feature selection in high-dimensional dataset using MapReduce

Reggiani, Claudio; Borgne, Yann-Aël Le; Bontempi, Gianluca

Full-text links:

Download:

Current browse context:

cs.DC

< prev | next >

new | recent | 1709

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Feature selection in high-dimensional dataset using MapReduce

Authors: Claudio Reggiani, Yann-Aël Le Borgne, Gianluca Bontempi

(Submitted on 7 Sep 2017)

Abstract: This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)
MSC classes:	68W15
ACM classes:	D.1.3
Cite as:	arXiv:1709.02327 [cs.DC]
	(or arXiv:1709.02327v1 [cs.DC] for this version)

Submission history

From: Claudio Reggiani [view email]
[v1] Thu, 7 Sep 2017 16:05:51 GMT (569kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1709.02327

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Feature selection in high-dimensional dataset using MapReduce

Submission history