We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IT

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Information Theory

Title: Minimax Rate-Optimal Estimation of Divergences between Discrete Distributions

Abstract: We refine the general methodology in [1] for the construction and analysis of essentially minimax estimators for a wide class of functionals of finite dimensional parameters, and elaborate on the case of discrete distributions with support size $S$ comparable with the number of observations $n$. Specifically, we determine the "smooth" and "non-smooth" regimes based on the confidence set and the smoothness of the functional. In the "non-smooth" regime, we apply an unbiased estimator for a suitable polynomial approximation of the functional. In the "smooth" regime, we construct a general version of the bias-corrected Maximum Likelihood Estimator (MLE) based on Taylor expansion.
We apply the general methodology to the problem of estimating the KL divergence between two discrete probability measures $P$ and $Q$ from empirical data in a non-asymptotic and possibly large alphabet setting. We construct minimax rate-optimal estimators for $D(P\|Q)$ when the likelihood ratio is upper bounded by a constant which may depend on the support size, and show that the performance of the optimal estimator with $n$ samples is essentially that of the MLE with $n\ln n$ samples. Our estimator is adaptive in the sense that it does not require the knowledge of the support size nor the upper bound on the likelihood ratio. We show that the general methodology results in minimax rate-optimal estimators for other divergences as well, such as the Hellinger distance and the $\chi^2$-divergence. Our approach refines the "Approximation" methodology recently developed for the construction of near minimax estimators of functionals of high-dimensional parameters, such as entropy, R\'enyi entropy, mutual information and $\ell_1$ distance in large alphabet settings, and shows that the "effective sample size enlargement" phenomenon holds significantly more widely than previously established.
Subjects: Information Theory (cs.IT); Statistics Theory (math.ST)
Cite as: arXiv:1605.09124 [cs.IT]
  (or arXiv:1605.09124v2 [cs.IT] for this version)

Submission history

From: Yanjun Han [view email]
[v1] Mon, 30 May 2016 07:24:03 GMT (223kb,D)
[v2] Thu, 24 Nov 2016 04:17:05 GMT (235kb,D)
[v3] Mon, 25 May 2020 11:02:28 GMT (44kb)
[v4] Thu, 29 Oct 2020 16:49:30 GMT (46kb)
[v5] Wed, 3 Mar 2021 06:36:52 GMT (46kb)

Link back to: arXiv, form interface, contact.