Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code

Savelka, Jaromir; Agarwal, Arav; Bogart, Christopher; Sakr, Majd

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2303

Computer Science > Computation and Language

Title: Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code

Authors: Jaromir Savelka, Arav Agarwal, Christopher Bogart, Majd Sakr

(Submitted on 9 Mar 2023)

Abstract: We analyzed effectiveness of three generative pre-trained transformer (GPT) models in answering multiple-choice question (MCQ) assessments, often involving short snippets of code, from introductory and intermediate programming courses at the postsecondary level. This emerging technology stirs countless discussions of its potential uses (e.g., exercise generation, code explanation) as well as misuses in programming education (e.g., cheating). However, the capabilities of GPT models and their limitations to reason about and/or analyze code in educational settings have been under-explored. We evaluated several OpenAI's GPT models on formative and summative MCQ assessments from three Python courses (530 questions). We found that MCQs containing code snippets are not answered as successfully as those that only contain natural language. While questions requiring to fill-in a blank in the code or completing a natural language statement about the snippet are handled rather successfully, MCQs that require analysis and/or reasoning about the code (e.g., what is true/false about the snippet, or what is its output) appear to be the most challenging. These findings can be leveraged by educators to adapt their instructional practices and assessments in programming courses, so that GPT becomes a valuable assistant for a learner as opposed to a source of confusion and/or potential hindrance in the learning process.

Comments:	12 pages
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2303.08033 [cs.CL]
	(or arXiv:2303.08033v1 [cs.CL] for this version)

Submission history

From: Jaromir Savelka [view email]
[v1] Thu, 9 Mar 2023 16:52:12 GMT (625kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2303.08033

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Large Language Models (GPT) Struggle to Answer Multiple-Choice Questions about Code

Submission history