We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Statistics Theory

Title: Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

Abstract: In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and describe our new algorithm performanceas well as its algorithmic complexity. A variety of missing value mechanisms (such as MCAR,MAR, MNAR) are considered and simulated. We study the quadratic errors and the bias ofour algorithm and compare it to the most popular missing values random forests algorithms inthe literature. In particular, we compare those techniques for both a regression and predictionpurpose. This work follows a first paper Gomez-Mendez and Joly (2020) on the consistency ofthis new algorithm.
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as: arXiv:2110.09333 [math.ST]
  (or arXiv:2110.09333v1 [math.ST] for this version)

Submission history

From: Emilien Joly [view email]
[v1] Mon, 18 Oct 2021 14:02:15 GMT (2733kb,D)

Link back to: arXiv, form interface, contact.