Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Kang, Minki; Lee, Seanie; Baek, Jinheon; Kawaguchi, Kenji; Hwang, Sung Ju

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2305

Computer Science > Computation and Language

Title: Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Authors: Minki Kang, Seanie Lee, Jinheon Baek, Kenji Kawaguchi, Sung Ju Hwang

(Submitted on 28 May 2023 (v1), last revised 30 Oct 2023 (this version, v2))

Abstract: Large Language Models (LLMs) have shown promising performance in knowledge-intensive reasoning tasks that require a compound understanding of knowledge. However, deployment of the LLMs in real-world applications can be challenging due to their high computational requirements and concerns on data privacy. Previous studies have focused on building task-specific small Language Models (LMs) by fine-tuning them with labeled data or distilling LLMs. However, these approaches are ill-suited for knowledge-intensive reasoning tasks due to the limited capacity of small LMs in memorizing the knowledge required. Motivated by our theoretical analysis on memorization, we propose Knowledge-Augmented Reasoning Distillation (KARD), a novel method that fine-tunes small LMs to generate rationales obtained from LLMs with augmented knowledge retrieved from an external knowledge base. Moreover, we further propose a neural reranker to obtain documents relevant to rationale generation. We empirically show that KARD significantly improves the performance of small T5 and GPT models on the challenging knowledge-intensive reasoning datasets, namely MedQA-USMLE, StrategyQA, and OpenbookQA. Notably, our method makes the 250M T5 models achieve superior performance against the fine-tuned 3B models, having 12 times larger parameters, on both MedQA-USMLE and StrategyQA benchmarks.

Comments:	NeurIPS 2023
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2305.18395 [cs.CL]
	(or arXiv:2305.18395v2 [cs.CL] for this version)

Submission history

From: Minki Kang [view email]
[v1] Sun, 28 May 2023 13:00:00 GMT (330kb,D)
[v2] Mon, 30 Oct 2023 08:20:14 GMT (1063kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2305.18395

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-Intensive Tasks

Submission history