Ensemble Models for Neural Source Code Summarization of Subroutines

LeClair, Alexander; Bansal, Aakash; McMillan, Collin

Full-text links:

Download:

Current browse context:

cs.SE

< prev | next >

new | recent | 2107

Change to browse by:

Computer Science > Software Engineering

Title: Ensemble Models for Neural Source Code Summarization of Subroutines

Authors: Alexander LeClair, Aakash Bansal, Collin McMillan

(Submitted on 23 Jul 2021)

Abstract: A source code summary of a subroutine is a brief description of that subroutine. Summaries underpin a majority of documentation consumed by programmers, such as the method summaries in JavaDocs. Source code summarization is the task of writing these summaries. At present, most state-of-the-art approaches for code summarization are neural network-based solutions akin to seq2seq, graph2seq, and other encoder-decoder architectures. The input to the encoder is source code, while the decoder helps predict the natural language summary. While these models tend to be similar in structure, evidence is emerging that different models make different contributions to prediction quality -- differences in model performance are orthogonal and complementary rather than uniform over the entire dataset. In this paper, we explore the orthogonal nature of different neural code summarization approaches and propose ensemble models to exploit this orthogonality for better overall performance. We demonstrate that a simple ensemble strategy boosts performance by up to 14.8%, and provide an explanation for this boost. The takeaway from this work is that a relatively small change to the inference procedure in most neural code summarization techniques leads to outsized improvements in prediction quality.

Comments:	ICSME 2021
Subjects:	Software Engineering (cs.SE)
Cite as:	arXiv:2107.11423 [cs.SE]
	(or arXiv:2107.11423v1 [cs.SE] for this version)

Submission history

From: Alexander LeClair [view email]
[v1] Fri, 23 Jul 2021 19:12:55 GMT (1687kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2107.11423

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Software Engineering

Title: Ensemble Models for Neural Source Code Summarization of Subroutines

Submission history