Local Search for Policy Iteration in Continuous Control

Springenberg, Jost Tobias; Heess, Nicolas; Mankowitz, Daniel; Merel, Josh; Byravan, Arunkumar; Abdolmaleki, Abbas; Kay, Jackie; Degrave, Jonas; Schrittwieser, Julian; Tassa, Yuval; Buchli, Jonas; Belov, Dan; Riedmiller, Martin

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2010

Computer Science > Machine Learning

Title: Local Search for Policy Iteration in Continuous Control

Authors: Jost Tobias Springenberg, Nicolas Heess, Daniel Mankowitz, Josh Merel, Arunkumar Byravan, Abbas Abdolmaleki, Jackie Kay, Jonas Degrave, Julian Schrittwieser, Yuval Tassa, Jonas Buchli, Dan Belov, Martin Riedmiller

(Submitted on 12 Oct 2020)

Abstract: We present an algorithm for local, regularized, policy improvement in reinforcement learning (RL) that allows us to formulate model-based and model-free variants in a single framework. Our algorithm can be interpreted as a natural extension of work on KL-regularized RL and introduces a form of tree search for continuous action spaces. We demonstrate that additional computation spent on model-based policy improvement during learning can improve data efficiency, and confirm that model-based policy improvement during action selection can also be beneficial. Quantitatively, our algorithm improves data efficiency on several continuous control benchmarks (when a model is learned in parallel), and it provides significant improvements in wall-clock time in high-dimensional domains (when a ground truth model is available). The unified framework also helps us to better understand the space of model-based and model-free algorithms. In particular, we demonstrate that some benefits attributed to model-based RL can be obtained without a model, simply by utilizing more computation.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2010.05545 [cs.LG]
	(or arXiv:2010.05545v1 [cs.LG] for this version)

Submission history

From: Jost Tobias Springenberg [view email]
[v1] Mon, 12 Oct 2020 09:02:48 GMT (1181kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2010.05545

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Local Search for Policy Iteration in Continuous Control

Submission history