VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

Hanif, Hazim; Maffeis, Sergio

doi:10.1109/IJCNN55064.2022.9892280

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 2205

Computer Science > Cryptography and Security

Title: VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

Authors: Hazim Hanif, Sergio Maffeis

(Submitted on 25 May 2022)

Abstract: This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters.

Comments:	Accepted as a conference paper at IJCNN 2022
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
MSC classes:	68M25 (Primary), 68T07 (Secondary)
ACM classes:	D.2.4; I.2.4; I.2.6
Journal reference:	International Joint Conference on Neural Networks (IJCNN), 2022
DOI:	10.1109/IJCNN55064.2022.9892280
Cite as:	arXiv:2205.12424 [cs.CR]
	(or arXiv:2205.12424v1 [cs.CR] for this version)

Submission history

From: Hazim Hanif [view email]
[v1] Wed, 25 May 2022 00:56:43 GMT (1633kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2205.12424

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Cryptography and Security

Title: VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

Submission history