Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

Zhao, Jun; Tong, Jingqi; Mou, Yurong; Zhang, Ming; Zhang, Qi; Huang, Xuanjing

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2405

Computer Science > Computation and Language

Title: Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

Authors: Jun Zhao, Jingqi Tong, Yurong Mou, Ming Zhang, Qi Zhang, Xuanjing Huang

(Submitted on 5 May 2024)

Abstract: Human cognition exhibits systematic compositionality, the algebraic ability to generate infinite novel combinations from finite learned components, which is the key to understanding and reasoning about complex logic. In this work, we investigate the compositionality of large language models (LLMs) in mathematical reasoning. Specifically, we construct a new dataset \textsc{MathTrap}\footnotemark[3] by introducing carefully designed logical traps into the problem descriptions of MATH and GSM8k. Since problems with logical flaws are quite rare in the real world, these represent ``unseen'' cases to LLMs. Solving these requires the models to systematically compose (1) the mathematical knowledge involved in the original problems with (2) knowledge related to the introduced traps. Our experiments show that while LLMs possess both components of requisite knowledge, they do not \textbf{spontaneously} combine them to handle these novel cases. We explore several methods to mitigate this deficiency, such as natural language prompts, few-shot demonstrations, and fine-tuning. We find that LLMs' performance can be \textbf{passively} improved through the above external intervention. Overall, systematic compositionality remains an open challenge for large language models.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.06680 [cs.CL]
	(or arXiv:2405.06680v1 [cs.CL] for this version)

Submission history

From: Jun Zhao [view email]
[v1] Sun, 5 May 2024 16:35:30 GMT (39kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2405.06680

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Computer Science > Computation and Language

Title: Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning

Submission history