We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DS

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Theory

Title: Binarized Johnson-Lindenstrauss embeddings

Abstract: We consider the problem of encoding a set of vectors into a minimal number of bits while preserving information on their Euclidean geometry. We show that this task can be accomplished by applying a Johnson-Lindenstrauss embedding and subsequently binarizing each vector by comparing each entry of the vector to a uniformly random threshold. Using this simple construction we produce two encodings of a dataset such that one can query Euclidean information for a pair of points using a small number of bit operations up to a desired additive error - Euclidean distances in the first case and inner products and squared Euclidean distances in the second. In the latter case, each point is encoded in near-linear time. The number of bits required for these encodings is quantified in terms of two natural complexity parameters of the dataset - its covering numbers and localized Gaussian complexity - and shown to be near-optimal.
Comments: The results of this preprint have been strongly improved and expanded. The current preprint is no longer intended for publication and has been replaced by two new preprints, posted as arXiv:2201.05204 and arXiv:2204.04109
Subjects: Information Theory (cs.IT); Data Structures and Algorithms (cs.DS); Metric Geometry (math.MG)
Cite as: arXiv:2009.08320 [cs.IT]
  (or arXiv:2009.08320v3 [cs.IT] for this version)

Submission history

From: Sjoerd Dirksen [view email]
[v1] Thu, 17 Sep 2020 14:12:40 GMT (27kb)
[v2] Sun, 23 Jan 2022 19:08:38 GMT (27kb)
[v3] Mon, 11 Apr 2022 14:00:57 GMT (27kb)

Link back to: arXiv, form interface, contact.