We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Machine Learning

Title: Distances with mixed type variables some modified Gower's coefficients

Abstract: Nearest neighbor methods have become popular in official statistics, mainly in imputation or in statistical matching problems; they play a key role in machine learning too, where a high number of variants have been proposed. The choice of the distance function depends mainly on the type of the selected variables. Unfortunately, relatively few options permit to handle mixed type variables, a situation frequently encountered in official statistics. The most popular distance for mixed type variables is derived as the complement of the Gower's similarity coefficient; it is appealing because ranges between 0 and 1 and allows to handle missing values. Unfortunately, the unweighted standard setting the contribution of the single variables to the overall Gower's distance is unbalanced because of the different nature of the variables themselves. This article tries to address the main drawbacks that affect the overall unweighted Gower's distance by suggesting some modifications in calculating the distance on the interval and ratio scaled variables. Simple modifications try to attenuate the impact of outliers on the scaled Manhattan distance; other modifications, relying on the kernel density estimation methods attempt to reduce the unbalanced contribution of the different types of variables. The performance of the proposals is evaluated in simulations mimicking the imputation of missing values through nearest neighbor distance hotdeck method.
Comments: 17 pages without figures
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:2101.02481 [stat.ML]
  (or arXiv:2101.02481v1 [stat.ML] for this version)

Submission history

From: Marcello D'Orazio [view email]
[v1] Thu, 7 Jan 2021 11:00:57 GMT (577kb)

Link back to: arXiv, form interface, contact.