We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Data Structures and Algorithms

Title: Distinct Elements in Streams: An Algorithm for the (Text) Book

Abstract: Given a data stream $\mathcal{D} = \langle a_1, a_2, \ldots, a_m \rangle$ of $m$ elements where each $a_i \in [n]$, the Distinct Elements problem is to estimate the number of distinct elements in $\mathcal{D}$.
Distinct Elements has been a subject of theoretical and empirical investigations over the past four decades resulting in space optimal algorithms for it.
All the current state-of-the-art algorithms are, however, beyond the reach of an undergraduate textbook owing to their reliance on the usage of notions such as pairwise independence and universal hash functions. We present a simple, intuitive, sampling-based space-efficient algorithm whose description and the proof are accessible to undergraduates with the knowledge of basic probability theory.
Comments: The earlier version of the paper, as published at ESA-22, contained error in the proof of Claim 4. The current revised version fixes the error in the proof. The authors decided to forgo the old convention of alphabetical ordering of authors in favor of a randomized ordering, denoted by \textcircled{r}
Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
Journal reference: Apppeared in Proceedings of 30th Annual European Symposium on Algorithms (ESA 2022)
DOI: 10.4230/LIPIcs.ESA.2022.34
Cite as: arXiv:2301.10191 [cs.DS]
  (or arXiv:2301.10191v1 [cs.DS] for this version)

Submission history

From: Kuldeep S. Meel [view email]
[v1] Tue, 24 Jan 2023 18:08:03 GMT (21kb,D)

Link back to: arXiv, form interface, contact.