Specializing Smaller Language Models towards Multi-Step Reasoning

Fu, Yao; Peng, Hao; Ou, Litu; Sabharwal, Ashish; Khot, Tushar

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2301

Computer Science > Computation and Language

Title: Specializing Smaller Language Models towards Multi-Step Reasoning

Authors: Yao Fu, Hao Peng, Litu Ou, Ashish Sabharwal, Tushar Khot

(Submitted on 30 Jan 2023)

Abstract: The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such abilities can, in fact, be distilled down from GPT-3.5 ($\ge$ 175B) to T5 variants ($\le$ 11B). We propose model specialization, to specialize the model's ability towards a target task. The hypothesis is that large models (commonly viewed as larger than 100B) have strong modeling power, but are spread on a large spectrum of tasks. Small models (commonly viewed as smaller than 10B) have limited model capacity, but if we concentrate their capacity on a specific target task, the model can achieve a decent improved performance. We use multi-step math reasoning as our testbed because it is a very typical emergent ability. We show two important aspects of model abilities: (1). there exists a very complex balance/ tradeoff between language models' multi-dimensional abilities; (2). by paying the price of decreased generic ability, we can clearly lift up the scaling curve of models smaller than 10B towards a specialized multi-step math reasoning ability. We further give comprehensive discussions about important design choices for better generalization, including the tuning data format, the start model checkpoint, and a new model selection method. We hope our practice and discoveries can serve as an important attempt towards specialized smaller models in the new research paradigm set by LLMs.

Comments:	Preprint
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2301.12726 [cs.CL]
	(or arXiv:2301.12726v1 [cs.CL] for this version)

Submission history

From: Yao Fu [view email]
[v1] Mon, 30 Jan 2023 08:51:19 GMT (1869kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2301.12726

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Specializing Smaller Language Models towards Multi-Step Reasoning

Submission history