We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.IR

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Information Retrieval

Title: Similarity Search on Computational Notebooks

Authors: Misato Horiuchi (1), Yuya Sasaki (1), Chuan Xiao (1), Makoto Onizuka (1) ((1) Osaka University)
Abstract: Computational notebook software such as Jupyter Notebook is popular for data science tasks. Numerous computational notebooks are available on the Web and reusable; however, searching for computational notebooks manually is a tedious task, and so far, there are no tools to search for computational notebooks effectively and efficiently. In this paper, we propose a similarity search on computational notebooks and develop a new framework for the similarity search. Given contents (i.e., source codes, tabular data, libraries, and outputs formats) in computational notebooks as a query, the similarity search problem aims to find top-k computational notebooks with the most similar contents. We define two similarity measures; set-based and graph-based similarities. Set-based similarity handles each content independently, while graph-based similarity captures the relationships between contents. Our framework can effectively prune the candidates of computational notebooks that should not be in the top-k results. Furthermore, we develop optimization techniques such as caching and indexing to accelerate the search. Experiments using Kaggle notebooks show that our method, in particular graph-based similarity, can achieve high accuracy and high efficiency.
Comments: 11 pages, 5 figures
Subjects: Information Retrieval (cs.IR)
Cite as: arXiv:2201.12786 [cs.IR]
  (or arXiv:2201.12786v1 [cs.IR] for this version)

Submission history

From: Misato Horiuchi [view email]
[v1] Sun, 30 Jan 2022 11:07:12 GMT (7391kb,D)

Link back to: arXiv, form interface, contact.