References & Citations
Mathematics > Probability
Title: Missing Mass Concentration for Markov Chains
(Submitted on 10 Jan 2020 (v1), last revised 14 Jan 2020 (this version, v2))
Abstract: The problem of missing mass in statistical inference (posed by McAllester and Ortiz, NIPS'02; most recently revisited by Changa and Thangaraj, ISIT'2019) seeks to estimate the weight of symbols that have not been sampled yet from a source.
So far all the approaches have been focused on the IID model which, although overly simplistic, is already not straightforward to tackle. The non-trivial part is in handling correlated events and sums of variables with very different scales where classical concentration inequalities do not yield good bounds.
In this paper we develop the research on missing mass further, solving the problem for Markov chains. We reduce the problem to studying the tails of hitting times and finding \emph{log-additive approximations} to them. More precisely, we combine the technique of majorization and certain estimates on set hitting times to show how the problem can be eventually reduced back to the IID case. Our contribution are a) new technique to obtain missing mass bounds - we replace traditionally used negative association by majorization which works for a wider class of processes b) first (exponential) concentration bounds for missing mass in Markov chain models c) simplifications of recent results on set hitting times and d) simplified derivation of missing mass estimates for memory-less sources.
Submission history
From: Maciej Skorski [view email][v1] Fri, 10 Jan 2020 18:36:08 GMT (14kb)
[v2] Tue, 14 Jan 2020 18:49:09 GMT (76kb)
Link back to: arXiv, form interface, contact.