We gratefully acknowledge support from
the Simons Foundation and member institutions.

Software Engineering

New submissions

[ total of 14 entries: 1-14 ]
[ showing up to 2000 entries per page: fewer | more ]

New submissions for Mon, 24 Jan 22

[1]  arXiv:2201.08452 [pdf, ps, other]
Title: npm-filter: Automating the mining of dynamic information from npm packages
Comments: 5 pages; this work is being submitted to the MSR tool track
Subjects: Software Engineering (cs.SE)

The static properties of code repositories, e.g., lines of code, dependents, dependencies, etc. can be readily scraped from code hosting platforms such as GitHub, and from package management systems such as npm for JavaScript; Although no less important, information related to the dynamic properties of programs, e.g., number of tests in a test suite that pass or fail is less readily available. This dynamic information could be immensely useful to researchers conducting corpus analyses, as it would give them the ability to differentiate projects based on properties of the projects that can only be observed by running them.
In this paper, we present npm-filter, an automated tool that can download, install, build, test, and run custom user scripts over the source code of JavaScript projects available on npm, the most popular JavaScript package manager. We outline this tool, describe its implementation, and show that npm-filter has already been useful in developing evaluation suites for multiple JavaScript tools.

[2]  arXiv:2201.08570 [pdf, other]
Title: An empirical study on Java method name suggestion: are we there yet?
Subjects: Software Engineering (cs.SE)

A large-scale evaluation for current naming approaches substantiates that such approaches are accurate. However, it is less known about which categories of method names work well via such naming approaches and how's the performance of naming approaches. To point out the superiority of the current naming approach, in this paper, we conduct an empirical study on such approaches in a new dataset. Moreover, we analyze the successful naming approaches above and find that: (1) around 60% of the accepted recommendation names are made on prefixes within get, set, is, and test. (2) A large portion (19.3%) of method names successfully recommended could be derived from the given method bodies. The comparisons also demonstrate the superior performance of the empirical study.

[3]  arXiv:2201.08627 [pdf, other]
Title: A Systematic Literature Review of Empirical Research on Quality Requirements
Comments: Accepted for publication in the Requiremeng Engineering journal
Subjects: Software Engineering (cs.SE)

Quality requirements deal with how well a product should perform the intended functionality, such as start-up time and learnability. Researchers argue they are important and at the same time studies indicate there are deficiencies in practice.
Our goal is to review the state of evidence for quality requirements. We want to understand the empirical research on quality requirements topics as well as evaluations of quality requirements solutions.
We used a hybrid method for our systematic literature review. We defined a start set based on two literature reviews combined with a keyword-based search from selected publication venues. We snowballed based on the start set.
We screened 530 papers and included 84 papers in our review. Case study method is the most common (43), followed by surveys (15) and tests (13). We found no replication studies. The two most commonly studied themes are 1) Differentiating characteristics of quality requirements compared to other types of requirements, 2) the importance and prevalence of quality requirements. Quality models, QUPER, and the NFR method are evaluated in several studies, with positive indications. Goal modeling is the only modeling approach evaluated. However, all studies are small scale and long-term costs and impact are not studied.
We conclude that more research is needed as empirical research on quality requirements is not increasing at the same rate as software engineering research in general. We see a gap between research and practice. The solutions proposed are usually evaluated in an academic context and surveys on quality requirements in industry indicate unsystematic handling of quality requirements.

[4]  arXiv:2201.08679 [pdf]
Title: Strategic Issues on Implementing a Software Process Improvement Program
Comments: InSITE Conference - Tampa, USA - 2015
Subjects: Software Engineering (cs.SE)

Software technology has high impact on the global economy as in many sectors of contemporary society. As a product enabling the most varied daily activities, the software product has to be produced reflecting high quality. Software quality is dependent on its development that is based in a large set of software development processes. However, the implementation and continuous improvement of software process aimed at software product should be carefully institutionalized by software development organizations such as software factories, testing factories, V&V organizations, among others. The institutionalization of programs such as a Software Process Improvement Program, or SPI Program, require a strategic planning, which is addressed in this article from the perspective of specific models and frameworks, as well as reflections based on software process engineering models and standards. In addition, a set of strategic drivers is proposed to assist the implementation of a Strategic Plan for a SPI Program which can be considered by the organizations before starting this kind of Program.

[5]  arXiv:2201.08698 [pdf, other]
Title: Natural Attack for Pre-trained Models of Code
Comments: Accepted to the Technical Track of ICSE 2022
Subjects: Software Engineering (cs.SE)

Pre-trained models of code have achieved success in many important software engineering tasks. However, these powerful models are vulnerable to adversarial attacks that slightly perturb model inputs to make a victim model produce wrong outputs. Current works mainly attack models of code with examples that preserve operational program semantics but ignore a fundamental requirement for adversarial example generation: perturbations should be natural to human judges, which we refer to as naturalness requirement.
In this paper, we propose ALERT (nAturaLnEss AwaRe ATtack), a black-box attack that adversarially transforms inputs to make victim models produce wrong outputs. Different from prior works, this paper considers the natural semantic of generated examples at the same time as preserving the operational semantic of original inputs. Our user study demonstrates that human developers consistently consider that adversarial examples generated by ALERT are more natural than those generated by the state-of-the-art work by Zhang et al. that ignores the naturalness requirement. On attacking CodeBERT, our approach can achieve attack success rates of 53.62%, 27.79%, and 35.78% across three downstream tasks: vulnerability prediction, clone detection and code authorship attribution. On GraphCodeBERT, our approach can achieve average success rates of 76.95%, 7.96% and 61.47% on the three tasks. The above outperforms the baseline by 14.07% and 18.56% on the two pre-trained models on average. Finally, we investigated the value of the generated adversarial examples to harden victim models through an adversarial fine-tuning procedure and demonstrated the accuracy of CodeBERT and GraphCodeBERT against ALERT-generated adversarial examples increased by 87.59% and 92.32%, respectively.

Cross-lists for Mon, 24 Jan 22

[6]  arXiv:2201.08441 (cross-list from cs.CR) [pdf, other]
Title: VUDENC: Vulnerability Detection with Deep Learning on a Natural Codebase for Python
Comments: Accepted Manuscript
Journal-ref: Information and Software Technology, Volume 144, April 2022, 106809
Subjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG); Software Engineering (cs.SE)

Context: Identifying potential vulnerable code is important to improve the security of our software systems. However, the manual detection of software vulnerabilities requires expert knowledge and is time-consuming, and must be supported by automated techniques. Objective: Such automated vulnerability detection techniques should achieve a high accuracy, point developers directly to the vulnerable code fragments, scale to real-world software, generalize across the boundaries of a specific software project, and require no or only moderate setup or configuration effort. Method: In this article, we present VUDENC (Vulnerability Detection with Deep Learning on a Natural Codebase), a deep learning-based vulnerability detection tool that automatically learns features of vulnerable code from a large and real-world Python codebase. VUDENC applies a word2vec model to identify semantically similar code tokens and to provide a vector representation. A network of long-short-term memory cells (LSTM) is then used to classify vulnerable code token sequences at a fine-grained level, highlight the specific areas in the source code that are likely to contain vulnerabilities, and provide confidence levels for its predictions. Results: To evaluate VUDENC, we used 1,009 vulnerability-fixing commits from different GitHub repositories that contain seven different types of vulnerabilities (SQL injection, XSS, Command injection, XSRF, Remote code execution, Path disclosure, Open redirect) for training. In the experimental evaluation, VUDENC achieves a recall of 78%-87%, a precision of 82%-96%, and an F1 score of 80%-90%. VUDENC's code, the datasets for the vulnerabilities, and the Python corpus for the word2vec model are available for reproduction. Conclusions: Our experimental results suggest...

[7]  arXiv:2201.08442 (cross-list from cs.LG) [pdf, other]
Title: Neural Network Quantization with AI Model Efficiency Toolkit (AIMET)
Comments: arXiv admin note: substantial text overlap with arXiv:2106.08295
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Hardware Architecture (cs.AR); Performance (cs.PF); Software Engineering (cs.SE)

While neural networks have advanced the frontiers in many machine learning applications, they often come at a high computational cost. Reducing the power and latency of neural network inference is vital to integrating modern networks into edge devices with strict power and compute requirements. Neural network quantization is one of the most effective ways of achieving these savings, but the additional noise it induces can lead to accuracy degradation. In this white paper, we present an overview of neural network quantization using AI Model Efficiency Toolkit (AIMET). AIMET is a library of state-of-the-art quantization and compression algorithms designed to ease the effort required for model optimization and thus drive the broader AI ecosystem towards low latency and energy-efficient inference. AIMET provides users with the ability to simulate as well as optimize PyTorch and TensorFlow models. Specifically for quantization, AIMET includes various post-training quantization (PTQ, cf. chapter 4) and quantization-aware training (QAT, cf. chapter 5) techniques that guarantee near floating-point accuracy for 8-bit fixed-point inference. We provide a practical guide to quantization via AIMET by covering PTQ and QAT workflows, code examples and practical tips that enable users to efficiently and effectively quantize models using AIMET and reap the benefits of low-bit integer inference.

[8]  arXiv:2201.08461 (cross-list from cs.CR) [pdf, ps, other]
Title: Polytope: Practical Memory Access Control for C++ Applications
Subjects: Cryptography and Security (cs.CR); Software Engineering (cs.SE)

Designing and implementing secure software is inarguably more important than ever. However, despite years of research into privilege separating programs, it remains difficult to actually do so and such efforts can take years of labor-intensive engineering to reach fruition. At the same time, new intra-process isolation primitives make strong data isolation and privilege separation more attractive from a performance perspective. Yet, substituting intra-process security boundaries for time-tested process boundaries opens the door to subtle but devastating privilege leaks. In this work, we present Polytope, a language extension to C++ that aims to make efficient privilege separation accessible to a wider audience of developers. Polytope defines a policy language encoded as C++11 attributes that separate code and data into distinct program partitions. A modified Clang front-end embeds source-level policy as metadata nodes in the LLVM IR. An LLVM pass interprets embedded policy and instruments an IR with code to enforce the source-level policy using Intel MPK. A run-time support library manages partitions, protection keys, dynamic memory operations, and indirect call target privileges. An evaluation demonstrates that Polytope provides equivalent protection to prior systems with a low annotation burden and comparable performance overhead. Polytope also renders privilege leaks that contradict intended policy impossible to express.

[9]  arXiv:2201.08662 (cross-list from quant-ph) [pdf, other]
Title: A Comprehensive Study of Bug Fixes in Quantum Programs
Subjects: Quantum Physics (quant-ph); Programming Languages (cs.PL); Software Engineering (cs.SE)

As quantum programming evolves, more and more quantum programming languages are being developed. As a result, debugging and testing quantum programs have become increasingly important. While bug fixing in classical programs has come a long way, there is a lack of research in quantum programs. To this end, this paper presents a comprehensive study on bug fixing in quantum programs. We collect and investigate 96 real-world bugs and their fixes from four popular quantum programming languages Qiskit, Cirq, Q#, and ProjectQ). Our study shows that a high proportion of bugs in quantum programs are quantum-specific bugs (over 80%), which requires further research in the bug fixing domain. We also summarize and extend the bug patterns in quantum programs and subdivide the most critical part, math-related bugs, to make it more applicable to the study of quantum programs. Our findings summarize the characteristics of bugs in quantum programs and provide a basis for studying testing and debugging quantum programs.

[10]  arXiv:2201.08810 (cross-list from cs.PL) [pdf, other]
Title: GAP-Gen: Guided Automatic Python Code Generation
Comments: 11 pages, 2 figures, 3 tables
Subjects: Programming Languages (cs.PL); Computation and Language (cs.CL); Machine Learning (cs.LG); Software Engineering (cs.SE)

Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, an automatic code generation method guided by Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining the crucial syn-tactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently throughout the code. In our work, rather than pre-training, we focus on modifying the fine-tuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer-based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest, and Code-Docstring-Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works

Replacements for Mon, 24 Jan 22

[11]  arXiv:2112.01218 (replaced) [pdf, other]
Title: GraphCode2Vec: Generic Code Embedding via Lexical and Program Dependence Analyses
Subjects: Software Engineering (cs.SE)
[12]  arXiv:2201.04876 (replaced) [pdf, other]
Title: Towards a Reference Software Architecture for Human-AI Teaming in Smart Manufacturing
Comments: Conference: ICSE-NIER 2022 - The 44th International Conference on Software Engineering, 5 pages, 1 figure
Subjects: Software Engineering (cs.SE); Artificial Intelligence (cs.AI)
[13]  arXiv:2201.07351 (replaced) [pdf, other]
Title: A Taxonomy of HTML5 Canvas Bugs
Comments: 11 pages, 4 figures; alignment of listings fixed
Subjects: Software Engineering (cs.SE)
[14]  arXiv:2110.02870 (replaced) [pdf, other]
Title: Capturing Structural Locality in Non-parametric Language Models
Comments: ICLR 2022
Subjects: Computation and Language (cs.CL); Software Engineering (cs.SE)
[ total of 14 entries: 1-14 ]
[ showing up to 2000 entries per page: fewer | more ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, cs, recent, 2201, contact, help  (Access key information)