Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies

Ojdanic, Milos; Garg, Aayush; Khanfir, Ahmed; Degiovanni, Renzo; Papadakis, Mike; Traon, Yves Le

Full-text links:

Download:

Current browse context:

cs.SE

< prev | next >

new | recent | 2112

Change to browse by:

Computer Science > Software Engineering

Title: Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies

Authors: Milos Ojdanic, Aayush Garg, Ahmed Khanfir, Renzo Degiovanni, Mike Papadakis, Yves Le Traon

(Submitted on 29 Dec 2021)

Abstract: Fault seeding is typically used in controlled studies to evaluate and compare test techniques. Central to these techniques lies the hypothesis that artificially seeded faults involve some form of realistic properties and thus provide realistic experimental results. In an attempt to strengthen realism, a recent line of research uses advanced machine learning techniques, such as deep learning and Natural Language Processing (NLP), to seed faults that look like (syntactically) real ones, implying that fault realism is related to syntactic similarity. This raises the question of whether seeding syntactically similar faults indeed results in semantically similar faults and more generally whether syntactically dissimilar faults are far away (semantically) from the real ones. We answer this question by employing 4 fault-seeding techniques (PiTest - a popular mutation testing tool, IBIR - a tool with manually crafted fault patterns, DeepMutation - a learning-based fault seeded framework and CodeBERT - a novel mutation testing tool that use code embeddings) and demonstrate that syntactic similarity does not reflect semantic similarity. We also show that 60%, 47%, 43%, and 7% of the real faults of Defects4J V2 are semantically resembled by CodeBERT, PiTest, IBIR, and DeepMutation faults. We then perform an objective comparison between the techniques and find that CodeBERT and PiTest have similar fault detection capabilities that subsume IBIR and DeepMutation, and that IBIR is the most cost-effective technique. Moreover, the overall fault detection of PiTest, CodeBERT, IBIR, and DeepMutation was, on average, 54%, 53%, 37%, and 7%.

Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2112.14508 [cs.SE]
	(or arXiv:2112.14508v1 [cs.SE] for this version)

Submission history

From: Milos Ojdanic [view email]
[v1] Wed, 29 Dec 2021 11:27:08 GMT (22120kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2112.14508

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Software Engineering

Title: Syntactic Vs. Semantic similarity of Artificial and Real Faults in Mutation Testing Studies

Submission history