Learning Latent State Spaces for Planning through Reward Prediction

Havens, Aaron; Ouyang, Yi; Nagarajan, Prabhat; Fujita, Yasuhiro

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1912

Computer Science > Machine Learning

Title: Learning Latent State Spaces for Planning through Reward Prediction

Authors: Aaron Havens, Yi Ouyang, Prabhat Nagarajan, Yasuhiro Fujita

(Submitted on 9 Dec 2019)

Abstract: Model-based reinforcement learning methods typically learn models for high-dimensional state spaces by aiming to reconstruct and predict the original observations. However, drawing inspiration from model-free reinforcement learning, we propose learning a latent dynamics model directly from rewards. In this work, we introduce a model-based planning framework which learns a latent reward prediction model and then plans in the latent state-space. The latent representation is learned exclusively from multi-step reward prediction which we show to be the only necessary information for successful planning. With this framework, we are able to benefit from the concise model-free representation, while still enjoying the data-efficiency of model-based algorithms. We demonstrate our framework in multi-pendulum and multi-cheetah environments where several pendulums or cheetahs are shown to the agent but only one of which produces rewards. In these environments, it is important for the agent to construct a concise latent representation to filter out irrelevant observations. We find that our method can successfully learn an accurate latent reward prediction model in the presence of the irrelevant information while existing model-based methods fail. Planning in the learned latent state-space shows strong performance and high sample efficiency over model-free and model-based baselines.

Comments:	Deep RL Workshop, Neurips 2019, Vancouver
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1912.04201 [cs.LG]
	(or arXiv:1912.04201v1 [cs.LG] for this version)

Submission history

From: Aaron Havens [view email]
[v1] Mon, 9 Dec 2019 17:32:51 GMT (374kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1912.04201

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Learning Latent State Spaces for Planning through Reward Prediction

Submission history