Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

Böhmer, Wendelin; Guo, Rong; Obermayer, Klaus

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 1612

Computer Science > Artificial Intelligence

Title: Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

Authors: Wendelin Böhmer, Rong Guo, Klaus Obermayer

(Submitted on 22 Dec 2016)

Abstract: This paper investigates a type of instability that is linked to the greedy policy improvement in approximated reinforcement learning. We show empirically that non-deterministic policy improvement can stabilize methods like LSPI by controlling the improvements' stochasticity. Additionally we show that a suitable representation of the value function also stabilizes the solution to some degree. The presented approach is simple and should also be easily transferable to more sophisticated algorithms like deep reinforcement learning.

Comments:	This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1612.07548 [cs.AI]
	(or arXiv:1612.07548v1 [cs.AI] for this version)

Submission history

From: Wendelin Böhmer [view email]
[v1] Thu, 22 Dec 2016 11:30:35 GMT (168kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1612.07548

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Artificial Intelligence

Title: Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

Submission history