We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Data Structures and Algorithms

Title: Subspace exploration: Bounds on Projected Frequency Estimation

Abstract: Given an $n \times d$ dimensional dataset $A$, a projection query specifies a subset $C \subseteq [d]$ of columns which yields a new $n \times |C|$ array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show $2^{\Omega(d)}$ lower bounds. However, we present upper bounds which demonstrate space dependency better than $2^d$. That is, for $c,c' \in (0,1)$ and a parameter $N=2^d$ an $N^c$-approximation can be obtained in space $\min(N^{c'},n)$, showing that it is possible to improve on the na\"{i}ve approach of keeping information for all $2^d$ subsets of $d$ columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.
Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Cite as: arXiv:2101.07546 [cs.DS]
  (or arXiv:2101.07546v1 [cs.DS] for this version)

Submission history

From: Charlie Dickens [view email]
[v1] Tue, 19 Jan 2021 10:18:22 GMT (170kb,D)

Link back to: arXiv, form interface, contact.