On Adaptive Distance Estimation

Cherapanamjeri, Yeshwanth; Nelson, Jelani

Full-text links:

Download:

Current browse context:

cs.DS

< prev | next >

new | recent | 2010

Change to browse by:

Computer Science > Data Structures and Algorithms

Title: On Adaptive Distance Estimation

Authors: Yeshwanth Cherapanamjeri, Jelani Nelson

(Submitted on 21 Oct 2020 (v1), last revised 16 Dec 2020 (this version, v2))

Abstract: We provide a static data structure for distance estimation which supports {\it adaptive} queries. Concretely, given a dataset $X = \{x_i\}_{i = 1}^n$ of $n$ points in $\mathbb{R}^d$ and $0 < p \leq 2$, we construct a randomized data structure with low memory consumption and query time which, when later given any query point $q \in \mathbb{R}^d$, outputs a $(1+\epsilon)$-approximation of $\lVert q - x_i \rVert_p$ with high probability for all $i\in[n]$. The main novelty is our data structure's correctness guarantee holds even when the sequence of queries can be chosen adaptively: an adversary is allowed to choose the $j$th query point $q_j$ in a way that depends on the answers reported by the data structure for $q_1,\ldots,q_{j-1}$. Previous randomized Monte Carlo methods do not provide error guarantees in the setting of adaptively chosen queries. Our memory consumption is $\tilde O((n+d)d/\epsilon^2)$, slightly more than the $O(nd)$ required to store $X$ in memory explicitly, but with the benefit that our time to answer queries is only $\tilde O(\epsilon^{-2}(n + d))$, much faster than the naive $\Theta(nd)$ time obtained from a linear scan in the case of $n$ and $d$ very large. Here $\tilde O$ hides $\log(nd/\epsilon)$ factors. We discuss applications to nearest neighbor search and nonparametric estimation.
Our method is simple and likely to be applicable to other domains: we describe a generic approach for transforming randomized Monte Carlo data structures which do not support adaptive queries to ones that do, and show that for the problem at hand, it can be applied to standard nonadaptive solutions to $\ell_p$ norm estimation with negligible overhead in query time and a factor $d$ overhead in memory.

Comments:	Minor correction in proof of Lemma B.6
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2010.11252 [cs.DS]
	(or arXiv:2010.11252v2 [cs.DS] for this version)

Submission history

From: Yeshwanth Cherapanamjeri [view email]
[v1] Wed, 21 Oct 2020 19:12:57 GMT (380kb,D)
[v2] Wed, 16 Dec 2020 07:16:56 GMT (381kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.11252

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Data Structures and Algorithms

Title: On Adaptive Distance Estimation

Submission history