We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SE

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Software Engineering

Title: Improving type information inferred by decompilers with supervised machine learning

Authors: Javier Escalada (1), Ted Scully (2), Francisco Ortin (1 and 2) ((1) University of Oviedo, (2) Cork Institute of Technology)
Abstract: In software reverse engineering, decompilation is the process of recovering source code from binary files. Decompilers are used when it is necessary to understand or analyze software for which the source code is not available. Although existing decompilers commonly obtain source code with the same behavior as the binaries, that source code is usually hard to interpret and certainly differs from the original code written by the programmer. Massive codebases could be used to build supervised machine learning models aimed at improving existing decompilers. In this article, we build different classification models capable of inferring the high-level type returned by functions, with significantly higher accuracy than existing decompilers. We automatically instrument C source code to allow the association of binary patterns with their corresponding high-level constructs. A dataset is created with a collection of real open-source applications plus a huge number of synthetic programs. Our system is able to predict function return types with a 79.1% F1-measure, whereas the best decompiler obtains a 30% F1-measure. Moreover, we document the binary patterns used by our classifier to allow their addition in the implementation of existing decompilers.
Subjects: Software Engineering (cs.SE); Machine Learning (cs.LG); Programming Languages (cs.PL)
ACM classes: I.5.1; I.2.6; D.3.4; D.2.3
Cite as: arXiv:2101.08116 [cs.SE]
  (or arXiv:2101.08116v2 [cs.SE] for this version)

Submission history

From: Francisco Ortin [view email]
[v1] Tue, 19 Jan 2021 11:45:46 GMT (1008kb)
[v2] Wed, 24 Feb 2021 11:01:27 GMT (642kb,D)

Link back to: arXiv, form interface, contact.