References & Citations
Mathematics > Statistics Theory
Title: Sample complexity of the distinct elements problem
(Submitted on 11 Dec 2016 (v1), last revised 15 Jan 2018 (this version, v2))
Abstract: We consider the distinct elements problem, where the goal is to estimate the number of distinct colors in an urn containing $ k $ balls based on $n$ samples drawn with replacements. Based on discrete polynomial approximation and interpolation, we propose an estimator with additive error guarantee that achieves the optimal sample complexity within $O(\log\log k)$ factors, and in fact within constant factors for most cases. The estimator can be computed in $O(n)$ time for an accurate estimation. The result also applies to sampling without replacement provided the sample size is a vanishing fraction of the urn size.
One of the key auxiliary results is a sharp bound on the minimum singular values of a real rectangular Vandermonde matrix, which might be of independent interest.
Submission history
From: Pengkun Yang [view email][v1] Sun, 11 Dec 2016 06:17:40 GMT (68kb,D)
[v2] Mon, 15 Jan 2018 03:03:03 GMT (70kb,D)
Link back to: arXiv, form interface, contact.