We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.SE

Change to browse by:

cs

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Software Engineering

Title: A Topic Guided Pointer-Generator Model for Generating Natural Language Code Summaries

Abstract: Code summarization is the task of generating natural language description of source code, which is important for program understanding and maintenance. Existing approaches treat the task as a machine translation problem (e.g., from Java to English) and applied Neural Machine Translation models to solve the problem. These approaches only consider a given code unit (e.g., a method) without its broader context. The lacking of context may hinder the NMT model from gathering sufficient information for code summarization. Furthermore, existing approaches use a fixed vocabulary and do not fully consider the words in code, while many words in the code summary may come from the code. In this work, we present a neural network model named ToPNN for code summarization, which uses the topics in a broader context (e.g., class) to guide the neural networks that combine the generation of new words and the copy of existing words in code. Based on the model we present an approach for generating natural language code summaries at the method level (i.e., method comments). We evaluate our approach using a dataset with 4,203,565 commented Java methods. The results show significant improvement over state-of-the-art approaches and confirm the positive effect of class topics and the copy mechanism.
Subjects: Software Engineering (cs.SE)
Cite as: arXiv:2107.01642 [cs.SE]
  (or arXiv:2107.01642v1 [cs.SE] for this version)

Submission history

From: Xin Wang [view email]
[v1] Sun, 4 Jul 2021 14:18:24 GMT (3026kb,D)

Link back to: arXiv, form interface, contact.