Proximal Deterministic Policy Gradient

Maggipinto, Marco; Susto, Gian Antonio; Chaudhari, Pratik

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2008

Computer Science > Machine Learning

Title: Proximal Deterministic Policy Gradient

Authors: Marco Maggipinto, Gian Antonio Susto, Pratik Chaudhari

(Submitted on 3 Aug 2020)

Abstract: This paper introduces two simple techniques to improve off-policy Reinforcement Learning (RL) algorithms. First, we formulate off-policy RL as a stochastic proximal point iteration. The target network plays the role of the variable of optimization and the value network computes the proximal operator. Second, we exploits the two value functions commonly employed in state-of-the-art off-policy algorithms to provide an improved action value estimate through bootstrapping with limited increase of computational resources. Further, we demonstrate significant performance improvement over state-of-the-art algorithms on standard continuous-control RL benchmarks.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2008.00759 [cs.LG]
	(or arXiv:2008.00759v1 [cs.LG] for this version)

Submission history

From: Marco Maggipinto [view email]
[v1] Mon, 3 Aug 2020 10:19:59 GMT (688kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2008.00759

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Proximal Deterministic Policy Gradient

Submission history