### Current browse context:

math.ST

### Change to browse by:

### References & Citations

# Mathematics > Statistics Theory

# Title: Estimation and Concentration of Missing Mass of Functions of Discrete Probability Distributions

(Submitted on 5 Oct 2021)

Abstract: Given a positive function $g$ from $[0,1]$ to the reals, the function's missing mass in a sequence of iid samples, defined as the sum of $g(pr(x))$ over the missing letters $x$, is introduced and studied. The missing mass of a function generalizes the classical missing mass, and has several interesting connections to other related estimation problems. Minimax estimation is studied for order-$\alpha$ missing mass ($g(p)=p^{\alpha}$) for both integer and non-integer values of $\alpha$. Exact minimax convergence rates are obtained for the integer case. Concentration is studied for a class of functions and specific results are derived for order-$\alpha$ missing mass and missing Shannon entropy ($g(p)=-p\log p$). Sub-Gaussian tail bounds with near-optimal worst-case variance factors are derived. Two new notions of concentration, named strongly sub-Gamma and filtered sub-Gaussian concentration, are introduced and shown to result in right tail bounds that are better than those obtained from sub-Gaussian concentration.

Link back to: arXiv, form interface, contact.