We gratefully acknowledge support from
the Simons Foundation and member institutions.

Programming Languages

New submissions

[ total of 5 entries: 1-5 ]
[ showing up to 1000 entries per page: fewer | more ]

New submissions for Wed, 29 Jun 22

[1]  arXiv:2206.13831 [pdf]
Title: Gradual Soundness: Lessons from Static Python
Authors: Kuang-Chen Lu (Brown University, USA), Ben Greenman (Brown University, USA), Carl Meyer (Meta, USA), Dino Viehland (Meta, USA), Aniket Panse (Meta, USA), Shriram Krishnamurthi (Brown University, USA)
Journal-ref: The Art, Science, and Engineering of Programming, 2023, Vol. 7, Issue 1, Article 2
Subjects: Programming Languages (cs.PL)

Context: Gradually-typed languages allow typed and untyped code to interoperate, but typically come with significant drawbacks. In some languages, the types are unreliable; in others, communication across type boundaries can be extremely expensive; and still others allow only limited forms of interoperability. The research community is actively seeking a sound, fast, and expressive approach to gradual typing.
Inquiry: This paper describes Static Python, a language developed by engineers at Instagram that has proven itself sound, fast, and reasonably expressive in production. Static Python's approach to gradual types is essentially a programmer-tunable combination of the concrete and transient approaches from the literature. Concrete types provide full soundness and low performance overhead, but impose nonlocal constraints. Transient types are sound in a shallow sense and easier to use; they help to bridge the gap between untyped code and typed concrete code.
Approach: We evaluate the language in its current state and develop a model that captures the essence of its approach to gradual types. We draw upon personal communication, bug reports, and the Static Python regression test suite to develop this model.
Knowledge: Our main finding is that the gradual soundness that arises from a mix of concrete and transient types is an effective way to lower the maintenance cost of the concrete approach. We also find that method-based JIT technology can eliminate the costs of the transient approach. On a more technical level, this paper describes two contributions: a model of Static Python and a performance evaluation of Static Python. The process of formalization found several errors in the implementation, including fatal errors.
Grounding: Our model of Static Python is implemented in PLT Redex and tested using property-based soundness tests and 265 tests from the Static Python regression suite. This paper includes a small core of the model to convey the main ideas of the Static Python approach and its soundness. Our performance claims are based on production experience in the Instagram web server. Migrations to Static Python in the server have caused a 3.7\% increase in requests handled per second at maximum CPU load.
Importance: Static Python is the first sound gradual language whose piece-meal application to a realistic codebase has consistently improved performance. Other language designers may wish to replicate its approach, especially those who currently maintain unsound gradual languages and are seeking a path to soundness.

[2]  arXiv:2206.14091 [pdf]
Title: Compilation Forking: A Fast and Flexible Way of Generating Data for Compiler-Internal Machine Learning Tasks
Authors: Raphael Mosaner (JKU Linz, Austria), David Leopoldseder (Oracle Labs Vienna, Austria), Wolfgang Kisling (JKU Linz, Austria), Lukas Stadler (Oracle Labs Linz, Austria), Hanspeter Mössenböck (JKU Linz, Austria)
Journal-ref: The Art, Science, and Engineering of Programming, 2023, Vol. 7, Issue 1, Article 3
Subjects: Programming Languages (cs.PL)

Compiler optimization decisions are often based on hand-crafted heuristics centered around a few established benchmark suites. Alternatively, they can be learned from feature and performance data produced during compilation. However, data-driven compiler optimizations based on machine learning models require large sets of quality data for training in order to match or even outperform existing human-crafted heuristics. In static compilation setups, related work has addressed this problem with iterative compilation. However, a dynamic compiler may produce different data depending on dynamically-chosen compilation strategies, which aggravates the generation of comparable data. We propose compilation forking, a technique for generating consistent feature and performance data from arbitrary, dynamically-compiled programs. Different versions of program parts with the same profiling and compilation history are executed within single program runs to minimize noise stemming from dynamic compilation and the runtime environment. Our approach facilitates large-scale performance evaluations of compiler optimization decisions. Additionally, compilation forking supports creating domain-specific compilation strategies based on machine learning by providing the data for model training. We implemented compilation forking in the GraalVM compiler in a programming-language-agnostic way. To assess the quality of the generated data, we trained several machine learning models to replace compiler heuristics for loop-related optimizations. The trained models perform equally well to the highly-tuned compiler heuristics when comparing the geometric means of benchmark suite performances. Larger impacts on few single benchmarks range from speedups of 20% to slowdowns of 17%. The presented approach can be implemented in any dynamic compiler. We believe that it can help to analyze compilation decisions and leverage the use of machine learning into dynamic compilation.

Cross-lists for Wed, 29 Jun 22

[3]  arXiv:2206.14148 (cross-list from cs.LG) [pdf, other]
Title: Memory Safe Computations with XLA Compiler
Comments: Preprint
Subjects: Machine Learning (cs.LG); Programming Languages (cs.PL); Machine Learning (stat.ML)

Software packages like TensorFlow and PyTorch are designed to support linear algebra operations, and their speed and usability determine their success. However, by prioritising speed, they often neglect memory requirements. As a consequence, the implementations of memory-intensive algorithms that are convenient in terms of software design can often not be run for large problems due to memory overflows. Memory-efficient solutions require complex programming approaches with significant logic outside the computational framework. This impairs the adoption and use of such algorithms. To address this, we developed an XLA compiler extension that adjusts the computational data-flow representation of an algorithm according to a user-specified memory limit. We show that k-nearest neighbour and sparse Gaussian process regression methods can be run at a much larger scale on a single device, where standard implementations would have failed. Our approach leads to better use of hardware resources. We believe that further focus on removing memory constraints at a compiler level will widen the range of machine learning methods that can be developed in the future.

[4]  arXiv:2206.14175 (cross-list from cs.SE) [pdf, other]
Title: InvAASTCluster : On Applying Invariant-Based Program Clustering to Introductory Programming Assignments
Comments: 21 pages, 6 Figures, 4 Tables. GitHub repo: this https URL
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Programming Languages (cs.PL)

Due to the vast number of students enrolled in Massive Open Online Courses (MOOCs), there has been an increasing number of automated program repair techniques focused on introductory programming assignments (IPAs). Such state-of-the-art techniques use program clustering to take advantage of previous correct student implementations to repair a given new incorrect submission. Usually, these repair techniques use clustering methods since analyzing all available correct student submissions to repair a program is not feasible. The clustering methods use program representations based on several features such as abstract syntax tree (AST), syntax, control flow, and data flow. However, these features are sometimes brittle when representing semantically similar programs.
This paper proposes InvAASTCluster, a novel approach for program clustering that takes advantage of dynamically generated program invariants observed over several program executions to cluster semantically equivalent IPAs. Our main objective is to find a more suitable representation of programs using a combination of the program's semantics, through its invariants, and its structure, through its anonymized abstract syntax tree. The evaluation of InvAASTCluster shows that the proposed program representation outperforms syntax-based representations when clustering a set of different correct IPAs. Furthermore, we integrate InvAASTCluster into a state-of-the-art clustering-based program repair tool and evaluate it on a set of IPAs. Our results show that InvAASTCluster advances the current state-of-the-art when used by clustering-based program repair tools by repairing a larger number of students' programs in a shorter amount of time.

Replacements for Wed, 29 Jun 22

[5]  arXiv:2110.00602 (replaced) [pdf, other]
Title: Applied Measure Theory for Probabilistic Modeling
Subjects: Computation (stat.CO); Programming Languages (cs.PL)
[ total of 5 entries: 1-5 ]
[ showing up to 1000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2206, contact, help  (Access key information)