Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Zhang, Shenao

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2209

Change to browse by:

Computer Science > Machine Learning

Title: Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Authors: Shenao Zhang

(Submitted on 16 Sep 2022)

Abstract: Provably efficient Model-Based Reinforcement Learning (MBRL) based on optimism or posterior sampling (PSRL) is ensured to attain the global optimality asymptotically by introducing the complexity measure of the model. However, the complexity might grow exponentially for the simplest nonlinear models, where global convergence is impossible within finite iterations. When the model suffers a large generalization error, which is quantitatively measured by the model complexity, the uncertainty can be large. The sampled model that current policy is greedily optimized upon will thus be unsettled, resulting in aggressive policy updates and over-exploration. In this work, we propose Conservative Dual Policy Optimization (CDPO) that involves a Referential Update and a Conservative Update. The policy is first optimized under a reference model, which imitates the mechanism of PSRL while offering more stability. A conservative range of randomness is guaranteed by maximizing the expectation of model value. Without harmful sampling procedures, CDPO can still achieve the same regret as PSRL. More importantly, CDPO enjoys monotonic policy improvement and global optimality simultaneously. Empirical results also validate the exploration efficiency of CDPO.

Comments:	Published at NeurIPS 2022
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2209.07676 [cs.LG]
	(or arXiv:2209.07676v1 [cs.LG] for this version)

Submission history

From: Shenao Zhang [view email]
[v1] Fri, 16 Sep 2022 02:27:01 GMT (5140kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2209.07676

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Submission history