Current browse context:
math.ST
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: Efficient multivariate entropy estimation via $k$-nearest neighbour distances
(Submitted on 1 Jun 2016 (v1), revised 14 Jul 2016 (this version, v2), latest version 22 Jun 2017 (v3))
Abstract: Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient in the sense of achieving the local asymptotic minimax lower bound. To this end, we initially study a generalisation of the estimator originally proposed by \citet{Kozachenko:87}, based on the $k$-nearest neighbour distances of a sample of $n$ independent and identically distributed random vectors in $\mathbb{R}^d$. When $d \leq 3$ and provided $k/\log^5 n \rightarrow \infty$ (as well as other regularity conditions), we show that the estimator is efficient; on the other hand, when $d\geq 4$, a non-trivial bias precludes its efficiency regardless of the choice of $k$. This motivates us to consider a new entropy estimator, formed as a weighted average of Kozachenko--Leonenko estimators for different values of $k$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness. In addition to the new estimator proposed and theoretical understanding provided, our results also have other methodological implications; in particular, they motivate the prewhitening of the data before applying the estimator and facilitate the construction of asymptotically valid confidence intervals of asymptotically minimal width.
Submission history
From: Richard Samworth [view email][v1] Wed, 1 Jun 2016 14:32:47 GMT (46kb,D)
[v2] Thu, 14 Jul 2016 14:59:54 GMT (52kb,D)
[v3] Thu, 22 Jun 2017 15:53:10 GMT (74kb)
Link back to: arXiv, form interface, contact.