We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: A New Approach for Testing Properties of Discrete Distributions

Abstract: In this work, we give a novel general approach for distribution testing. We describe two techniques: our first technique gives sample-optimal testers, while our second technique gives matching sample lower bounds. As a consequence, we resolve the sample complexity of a wide variety of testing problems.
Our upper bounds are obtained via a modular reduction-based approach. Our approach yields optimal testers for numerous problems by using a standard $\ell_2$-identity tester as a black-box. Using this recipe, we obtain simple estimators for a wide range of problems, encompassing most problems previously studied in the TCS literature, namely: (1) identity testing to a fixed distribution, (2) closeness testing between two unknown distributions (with equal/unequal sample sizes), (3) independence testing (in any number of dimensions), (4) closeness testing for collections of distributions, and (5) testing histograms. For all of these problems, our testers are sample-optimal, up to constant factors. With the exception of (1), ours are the {\em first sample-optimal testers for the corresponding problems.} Moreover, our estimators are significantly simpler to state and analyze compared to previous results.
As an application of our reduction-based technique, we obtain the first {\em nearly instance-optimal} algorithm for testing equivalence between two {\em unknown} distributions. Moreover, our technique naturally generalizes to other metrics beyond the $\ell_1$-distance.
Our lower bounds are obtained via a direct information-theoretic approach: Given a candidate hard instance, our proof proceeds by bounding the mutual information between appropriate random variables. While this is a classical method in information theory, prior to our work, it had not been used in distribution property testing.
Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT); Statistics Theory (math.ST)
Cite as: arXiv:1601.05557 [cs.DS]
  (or arXiv:1601.05557v2 [cs.DS] for this version)

Submission history

From: Ilias Diakonikolas [view email]
[v1] Thu, 21 Jan 2016 09:06:17 GMT (25kb)
[v2] Mon, 9 May 2016 06:55:09 GMT (32kb)

Link back to: arXiv, form interface, contact.