We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SE

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Software Engineering

Title: ReSplit: Improving the Structure of Jupyter Notebooks by Re-Splitting Their Cells

Abstract: Jupyter notebooks represent a unique format for programming - a combination of code and Markdown with rich formatting, separated into individual cells. We propose to perceive a Jupyter Notebook cell as a simplified and raw version of a programming function. Similar to functions, Jupyter cells should strive to contain singular, self-contained actions. At the same time, research shows that real-world notebooks fail to do so and suffer from the lack of proper structure.
To combat this, we propose ReSplit, an algorithm for an automatic re-splitting of cells in Jupyter notebooks. The algorithm analyzes definition-usage chains in the notebook and consists of two parts - merging and splitting the cells. We ran the algorithm on a large corpus of notebooks to evaluate its performance and its overall effect on notebooks, and evaluated it by human experts: we showed them several notebooks in their original and the re-split form. In 29.5% of cases, the re-split notebook was selected as the preferred way of perceiving the code. We analyze what influenced this decision and describe several individual cases in detail.
Comments: 5 pages, 2 figures
Subjects: Software Engineering (cs.SE)
Cite as: arXiv:2112.14825 [cs.SE]
  (or arXiv:2112.14825v1 [cs.SE] for this version)

Submission history

From: Yaroslav Golubev [view email]
[v1] Wed, 29 Dec 2021 21:15:30 GMT (59kb,D)

Link back to: arXiv, form interface, contact.