Current browse context:
cs.IT
Change to browse by:
References & Citations
Computer Science > Information Theory
Title: Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions
(Submitted on 30 May 2016 (v1), revised 24 Nov 2016 (this version, v2), latest version 3 Mar 2021 (v5))
Abstract: We refine the general methodology in [1] for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions with support size $S$ comparable with the number of observations $n$. Specifically, we determine the "smooth" and "non-smooth" regimes based on the confidence set and the smoothness of the functional. In the "non-smooth" regime, we apply an unbiased estimator for a suitable polynomial approximation of the functional. In the "smooth" regime, we construct a general version of the bias-corrected Maximum Likelihood Estimator (MLE) based on Taylor expansion.
We apply the general methodology to the problem of estimating the KL divergence between two discrete probability measures $P$ and $Q$ from empirical data in a non-asymptotic and possibly large alphabet setting. We construct minimax rate-optimal estimators for $D(P\|Q)$ when the likelihood ratio is upper bounded by a constant which may depend on the support size, and show that the performance of the optimal estimator with $n$ samples is essentially that of the MLE with $n\ln n$ samples. Our estimator is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio. We show that the general methodology results in minimax rate-optimal estimators for other divergences as well, such as the Hellinger distance and the $\chi^2$-divergence. Our approach refines the "Approximation" methodology recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, R\'enyi entropy, mutual information and $\ell_1$ distance in large alphabet settings, and shows that the "effective sample size enlargement" phenomenon holds significantly more widely than previously established.
Submission history
From: Yanjun Han [view email][v1] Mon, 30 May 2016 07:24:03 GMT (223kb,D)
[v2] Thu, 24 Nov 2016 04:17:05 GMT (235kb,D)
[v3] Mon, 25 May 2020 11:02:28 GMT (44kb)
[v4] Thu, 29 Oct 2020 16:49:30 GMT (46kb)
[v5] Wed, 3 Mar 2021 06:36:52 GMT (46kb)
Link back to: arXiv, form interface, contact.