We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: ICodeNet -- A Hierarchical Neural Network Approach for Source Code Author Identification

Abstract: With the open-source revolution, source codes are now more easily accessible than ever. This has, however, made it easier for malicious users and institutions to copy the code without giving regards to the license, or credit to the original author. Therefore, source code author identification is a critical task with paramount importance. In this paper, we propose ICodeNet - a hierarchical neural network that can be used for source code file-level tasks. The ICodeNet processes source code in image format and is employed for the task of per file author identification. The ICodeNet consists of an ImageNet trained VGG encoder followed by a shallow neural network. The shallow network is based either on CNN or LSTM. Different variations of models are evaluated on a source code author classification dataset. We have also compared our image-based hierarchical neural network model with simple image-based CNN architecture and text-based CNN and LSTM models to highlight its novelty and efficiency.
Comments: Accepted at ICMLC 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Information Retrieval (cs.IR); Programming Languages (cs.PL)
DOI: 10.1145/3457682.3457709
Cite as: arXiv:2102.00230 [cs.LG]
  (or arXiv:2102.00230v1 [cs.LG] for this version)

Submission history

From: Raviraj Joshi [view email]
[v1] Sat, 30 Jan 2021 14:05:22 GMT (754kb,D)

Link back to: arXiv, form interface, contact.