We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Safe Policy Improvement with Baseline Bootstrapping

Abstract: This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement Learning (Batch RL): from a fixed dataset and without direct access to the true environment, train a policy that is guaranteed to perform at least as well as the baseline policy used to collect the data. Our approach, called SPI with Baseline Bootstrapping (SPIBB), is inspired by the knows-what-it-knows paradigm: it bootstraps the trained policy with the baseline when the uncertainty is high. Our first algorithm, $\Pi_b$-SPIBB, comes with SPI theoretical guarantees. We also implement a variant, $\Pi_{\leq b}$-SPIBB, that is even more efficient in practice. We apply our algorithms to a motivational stochastic gridworld domain and further demonstrate on randomly generated MDPs the superiority of SPIBB with respect to existing algorithms, not only in safety but also in mean performance. Finally, we implement a model-free version of SPIBB and show its benefits on a navigation task with deep RL implementation called SPIBB-DQN, which is, to the best of our knowledge, the first RL algorithm relying on a neural network representation able to train efficiently and reliably from batch data, without any interaction with the environment.
Comments: accepted as a long oral at ICML2019
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as: arXiv:1712.06924 [cs.LG]
  (or arXiv:1712.06924v5 [cs.LG] for this version)

Submission history

From: Romain Laroche [view email]
[v1] Tue, 19 Dec 2017 13:43:41 GMT (11866kb,D)
[v2] Wed, 20 Dec 2017 19:52:03 GMT (11866kb,D)
[v3] Thu, 18 Jan 2018 21:37:53 GMT (8422kb,D)
[v4] Thu, 14 Jun 2018 19:54:34 GMT (8002kb,D)
[v5] Fri, 7 Jun 2019 17:45:54 GMT (18526kb,D)

Link back to: arXiv, form interface, contact.