Current browse context:
math.ST
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: A simple extension of Azadkia $\&$ Chatterjee's rank correlation to a vector of endogenous variables
(Submitted on 3 Dec 2022 (v1), last revised 3 Mar 2023 (this version, v2))
Abstract: We propose a direct and natural extension of Azadkia & Chatterjee's rank correlation $T$ introduced in [4] to a set of $q \geq 1$ endogenous variables. The approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation $T$ to it. The novel measure $T^q$ then quantifies the scale-invariant extent of functional dependence of an endogenous vector ${\bf Y} = (Y_1,\dots,Y_q)$ on a number of exogenous variables ${\bf X} = (X_1,\dots,X_p)$, $p\geq1$, characterizes independence of ${\bf X}$ and ${\bf Y}$ as well as perfect dependence of ${\bf Y}$ on ${\bf X}$ and hence fulfills all the desired characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various general invariance and continuity conditions for $T^q$ as well as novel ordering results for conditional distributions, revealing new insights into the nature of $T$. Building upon the graph-based estimator for $T$ in [4], we present a non-parametric estimator for $T^q$ that is strongly consistent in full generality, i.e., without any distributional assumptions. Based on this estimator we develop a model-free and dependence-based feature ranking and forward feature selection of multiple-outcome data, and establish tools for identifying networks between random variables. Real case studies illustrate the main aspects of the developed methodology.
Submission history
From: Sebastian Fuchs [view email][v1] Sat, 3 Dec 2022 14:24:14 GMT (29kb)
[v2] Fri, 3 Mar 2023 12:15:25 GMT (193kb,D)
Link back to: arXiv, form interface, contact.