We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.AI

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Artificial Intelligence

Title: Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

Abstract: This paper investigates a type of instability that is linked to the greedy policy improvement in approximated reinforcement learning. We show empirically that non-deterministic policy improvement can stabilize methods like LSPI by controlling the improvements' stochasticity. Additionally we show that a suitable representation of the value function also stabilizes the solution to some degree. The presented approach is simple and should also be easily transferable to more sophisticated algorithms like deep reinforcement learning.
Comments: This paper has been presented at the 13th European Workshop on Reinforcement Learning (EWRL 2016) on the 3rd and 4th of December 2016 in Barcelona, Spain
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:1612.07548 [cs.AI]
  (or arXiv:1612.07548v1 [cs.AI] for this version)

Submission history

From: Wendelin Böhmer [view email]
[v1] Thu, 22 Dec 2016 11:30:35 GMT (168kb)

Link back to: arXiv, form interface, contact.