We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DC

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Distributed, Parallel, and Cluster Computing

Title: PFO: A Parallel Friendly High Performance System for Online Query and Update of Nearest Neighbors

Abstract: Nearest Neighbor(s) search is the fundamental computational primitive to tackle massive dataset. Locality Sensitive Hashing (LSH) has been a bracing tool for Nearest Neighbor(s) search in high dimensional spaces. However, traditional LSH systems cannot be applied in online big data systems to handle a large volume of query/update requests, because most of the systems optimize the query efficiency with the assumption of infrequent updates and missing the parallel-friendly design. As a result, the state-of-the-art LSH systems cannot adapt the system response to the user behavior interactively.
In this paper, we propose a new LSH system called PFO. It handles query/update requests in RAM and scales the system capacity by using flash memory. To achieve high streaming data throughput, PFO adopts a parallel-friendly indexing structure while preserving the distance between data points. Further, it accommodates inbound data in real-time and dispatches update requests intelligently to eliminate the cross-threads synchronization. We carried out extensive evaluations with large synthetic and standard benchmark datasets. Results demonstrate that PFO delivers shorter latency and offers scalable capacity compared with the existing LSH systems. PFO serves with higher throughput than the state-of-the-art LSH indexing structure when dealing with online query/update requests to nearest neighbors. Meanwhile, PFO returns neighbors with much better quality, thus being efficient to handle online big data applications, e.g. streaming recommendation system, interactive machine learning systems.
Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as: arXiv:1604.06984 [cs.DC]
  (or arXiv:1604.06984v3 [cs.DC] for this version)

Submission history

From: Nan Zhu [view email]
[v1] Sun, 24 Apr 2016 05:08:27 GMT (366kb,D)
[v2] Fri, 13 May 2016 23:16:52 GMT (362kb,D)
[v3] Sun, 22 May 2016 21:20:32 GMT (358kb,D)

Link back to: arXiv, form interface, contact.