Offline RL for Natural Language Generation with Implicit Language Q Learning

Snell, Charlie; Kostrikov, Ilya; Su, Yi; Yang, Mengjiao; Levine, Sergey

Full-text links:

Download:

Current browse context:

cs.CL

< prev | next >

new | recent | 2206

Computer Science > Computation and Language

Title: Offline RL for Natural Language Generation with Implicit Language Q Learning

Authors: Charlie Snell, Ilya Kostrikov, Yi Su, Mengjiao Yang, Sergey Levine

(Submitted on 5 Jun 2022 (this version), latest version 1 May 2023 (v2))

Abstract: Large language models distill broad knowledge from text corpora. However, they can be inconsistent when it comes to completing user specified tasks. This issue can be addressed by finetuning such models via supervised learning on curated datasets, or via reinforcement learning. In this work, we propose a novel offline RL motivated method, implicit language Q-learning (ILQL), designed for use on language models, that combines both the flexible utility optimization framework of traditional RL algorithms with supervised learning's ability to leverage existing data and its simplicity and stability. Our method, based on dynamic programming, employs a blend of value conservatism alongside an implicit dataset support constraint in learning value functions, which are then used to guide language model generations towards maximizing utility. In addition to empirically validating ILQL, we present a detailed empirical analysis of situations where offline RL can be useful in natural language generation settings, demonstrating how it can be a more effective utility optimizer than prior approaches for end-to-end dialogue, and how it can effectively optimize high variance reward functions based on subjective judgement, such as whether to label a comment as an example of toxic speech or not.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2206.11871 [cs.CL]
	(or arXiv:2206.11871v1 [cs.CL] for this version)

Submission history

From: Charlie Snell [view email]
[v1] Sun, 5 Jun 2022 18:38:42 GMT (1185kb,D)
[v2] Mon, 1 May 2023 04:42:27 GMT (1224kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2206.11871v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computation and Language

Title: Offline RL for Natural Language Generation with Implicit Language Q Learning

Submission history