On using distributed representations of source code for the detection of C security vulnerabilities

Coimbra, David; Reis, Sofia; Abreu, Rui; Păsăreanu, Corina; Erdogmus, Hakan

Full-text links:

Download:

Current browse context:

cs.CR

< prev | next >

new | recent | 2106

Computer Science > Cryptography and Security

Title: On using distributed representations of source code for the detection of C security vulnerabilities

Authors: David Coimbra, Sofia Reis, Rui Abreu, Corina Păsăreanu, Hakan Erdogmus

(Submitted on 1 Jun 2021)

Abstract: This paper presents an evaluation of the code representation model Code2vec when trained on the task of detecting security vulnerabilities in C source code. We leverage the open-source library astminer to extract path-contexts from the abstract syntax trees of a corpus of labeled C functions. Code2vec is trained on the resulting path-contexts with the task of classifying a function as vulnerable or non-vulnerable. Using the CodeXGLUE benchmark, we show that the accuracy of Code2vec for this task is comparable to simple transformer-based methods such as pre-trained RoBERTa, and outperforms more naive NLP-based methods. We achieved an accuracy of 61.43% while maintaining low computational requirements relative to larger models.

Comments:	Submitted to DX 2021
Subjects:	Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Programming Languages (cs.PL); Software Engineering (cs.SE)
Cite as:	arXiv:2106.01367 [cs.CR]
	(or arXiv:2106.01367v1 [cs.CR] for this version)

Submission history

From: David Coimbra [view email]
[v1] Tue, 1 Jun 2021 21:18:23 GMT (444kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.01367

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Cryptography and Security

Title: On using distributed representations of source code for the detection of C security vulnerabilities

Submission history