BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

Artuso, Fiorella; Mormando, Marco; Di Luna, Giuseppe A.; Querzoni, Leonardo

Full-text links:

Download:

Current browse context:

cs.CR

< prev | next >

new | recent | 2208

Computer Science > Cryptography and Security

Title: BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

Authors: Fiorella Artuso, Marco Mormando, Giuseppe A. Di Luna, Leonardo Querzoni

(Submitted on 13 Aug 2022)

Abstract: A recent trend in binary code analysis promotes the use of neural solutions based on instruction embedding models. An instruction embedding model is a neural network that transforms sequences of assembly instructions into embedding vectors. If the embedding network is trained such that the translation from code to vectors partially preserves the semantic, the network effectively represents an assembly code model.
In this paper we present BinBert, a novel assembly code model. BinBert is built on a transformer pre-trained on a huge dataset of both assembly instruction sequences and symbolic execution information. BinBert can be applied to assembly instructions sequences and it is fine-tunable, i.e. it can be re-trained as part of a neural architecture on task-specific data. Through fine-tuning, BinBert learns how to apply the general knowledge acquired with pre-training to the specific task.
We evaluated BinBert on a multi-task benchmark that we specifically designed to test the understanding of assembly code. The benchmark is composed of several tasks, some taken from the literature, and a few novel tasks that we designed, with a mix of intrinsic and downstream tasks.
Our results show that BinBert outperforms state-of-the-art models for binary instruction embedding, raising the bar for binary code understanding.

Subjects:	Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Cite as:	arXiv:2208.06692 [cs.CR]
	(or arXiv:2208.06692v1 [cs.CR] for this version)

Submission history

From: Giuseppe Antonio Di Luna [view email]
[v1] Sat, 13 Aug 2022 17:48:52 GMT (338kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2208.06692

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Cryptography and Security

Title: BinBert: Binary Code Understanding with a Fine-tunable and Execution-aware Transformer

Submission history