References & Citations
Computer Science > Distributed, Parallel, and Cluster Computing
Title: PaSh: Light-touch Data-Parallel Shell Processing
(Submitted on 18 Jul 2020 (v1), last revised 3 Apr 2021 (this version, v4))
Abstract: This paper presents {\scshape PaSh}, a system for parallelizing POSIX shell scripts. Given a script, {\scshape PaSh} converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow graph back into a script -- one that adds POSIX constructs to explicitly guide parallelism coupled with {\scshape PaSh}-provided {\scshape Unix}-aware runtime primitives for addressing performance- and correctness-related issues. A lightweight annotation language allows command developers to express key parallelizability properties about their commands. An accompanying parallelizability study of POSIX and GNU commands -- two large and commonly used groups -- guides the annotation language and optimized aggregator library that {\scshape PaSh} uses. Finally, {\scshape PaSh}'s {\scshape PaSh}'s extensive evaluation over 44 unmodified {\scshape Unix} scripts shows significant speedups ($0.89$--$61.1\times$, avg: $6.7\times$) stemming from the combination of its program transformations and runtime primitives.
Submission history
From: Konstantinos Kallas [view email][v1] Sat, 18 Jul 2020 14:14:11 GMT (568kb,D)
[v2] Sun, 11 Oct 2020 20:24:41 GMT (697kb,D)
[v3] Mon, 4 Jan 2021 02:04:55 GMT (697kb,D)
[v4] Sat, 3 Apr 2021 16:02:11 GMT (468kb,D)
Link back to: arXiv, form interface, contact.