We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: Feature selection in high-dimensional dataset using MapReduce

Abstract: This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Machine Learning (stat.ML)
MSC classes: 68W15
ACM classes: D.1.3
Cite as: arXiv:1709.02327 [cs.DC]
  (or arXiv:1709.02327v1 [cs.DC] for this version)

Submission history

From: Claudio Reggiani [view email]
[v1] Thu, 7 Sep 2017 16:05:51 GMT (569kb,D)

Link back to: arXiv, form interface, contact.