Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Sun, Ke; Wang, Yafei; Liu, Yi; Zhao, Yingnan; Pan, Bo; Jui, Shangling; Jiang, Bei; Kong, Linglong

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2110

Computer Science > Machine Learning

Title: Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Authors: Ke Sun, Yafei Wang, Yi Liu, Yingnan Zhao, Bo Pan, Shangling Jui, Bei Jiang, Linglong Kong

(Submitted on 17 Oct 2021 (v1), last revised 20 Oct 2021 (this version, v2))

Abstract: Anderson mixing has been heuristically applied to reinforcement learning (RL) algorithms for accelerating convergence and improving the sampling efficiency of deep RL. Despite its heuristic improvement of convergence, a rigorous mathematical justification for the benefits of Anderson mixing in RL has not yet been put forward. In this paper, we provide deeper insights into a class of acceleration schemes built on Anderson mixing that improve the convergence of deep RL algorithms. Our main results establish a connection between Anderson mixing and quasi-Newton methods and prove that Anderson mixing increases the convergence radius of policy iteration schemes by an extra contraction factor. The key focus of the analysis roots in the fixed-point iteration nature of RL. We further propose a stabilization strategy by introducing a stable regularization term in Anderson mixing and a differentiable, non-expansive MellowMax operator that can allow both faster convergence and more stable behavior. Extensive experiments demonstrate that our proposed method enhances the convergence, stability, and performance of RL algorithms.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2110.08896 [cs.LG]
	(or arXiv:2110.08896v2 [cs.LG] for this version)

Submission history

From: Yafei Wang [view email]
[v1] Sun, 17 Oct 2021 19:07:25 GMT (295kb,D)
[v2] Wed, 20 Oct 2021 05:12:46 GMT (737kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2110.08896

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Damped Anderson Mixing for Deep Reinforcement Learning: Acceleration, Convergence, and Stabilization

Submission history