Natural Policy Gradients In Reinforcement Learning Explained

van Heeswijk, W. J. A.

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2209

Computer Science > Machine Learning

Title: Natural Policy Gradients In Reinforcement Learning Explained

Authors: W.J.A. van Heeswijk

(Submitted on 5 Sep 2022)

Abstract: Traditional policy gradient methods are fundamentally flawed. Natural gradients converge quicker and better, forming the foundation of contemporary Reinforcement Learning such as Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO). This lecture note aims to clarify the intuition behind natural policy gradients, focusing on the thought process and the key mathematical constructs.

Comments:	14 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC)
Cite as:	arXiv:2209.01820 [cs.LG]
	(or arXiv:2209.01820v1 [cs.LG] for this version)

Submission history

From: Wouter van Heeswijk PhD [view email]
[v1] Mon, 5 Sep 2022 08:06:29 GMT (559kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2209.01820

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Natural Policy Gradients In Reinforcement Learning Explained

Submission history