Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Rita, Mathieu; Strub, Florian; Chaabouni, Rahma; Michel, Paul; Dupoux, Emmanuel; Pietquin, Olivier

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2404

Change to browse by:

Computer Science > Computation and Language

Title: Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Authors: Mathieu Rita, Florian Strub, Rahma Chaabouni, Paul Michel, Emmanuel Dupoux, Olivier Pietquin

(Submitted on 30 Apr 2024)

Abstract: While Reinforcement Learning (RL) has been proven essential for tuning large language models (LLMs), it can lead to reward over-optimization (ROO). Existing approaches address ROO by adding KL regularization, requiring computationally expensive hyperparameter tuning. Additionally, KL regularization focuses solely on regularizing the language policy, neglecting a potential source of regularization: the reward function itself. Inspired by demonstration-guided RL, we here introduce the Reward Calibration from Demonstration (RCfD), which leverages human demonstrations and a reward model to recalibrate the reward objective. Formally, given a prompt, the RCfD objective minimizes the distance between the demonstrations' and LLM's rewards rather than directly maximizing the reward function. This objective shift avoids incentivizing the LLM to exploit the reward model and promotes more natural and diverse language generation. We show the effectiveness of RCfD on three language tasks, which achieves comparable performance to carefully tuned baselines while mitigating ROO.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2404.19409 [cs.CL]
	(or arXiv:2404.19409v1 [cs.CL] for this version)

Submission history

From: Mathieu Rita [view email]
[v1] Tue, 30 Apr 2024 09:57:21 GMT (3202kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2404.19409

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning

Submission history