Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles
(Submitted on 5 Jun 2017 (v1), revised 11 Jun 2017 (this version, v2), latest version 7 Nov 2017 (v3))
Abstract: We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. First we propose an exploration strategy based on upper-confidence bounds (UCB). Next, we define an "InfoGain" exploration bonus, which depends on the disagreement of the $Q$-ensemble. Our experiments show significant gains on the Atari benchmark.
Submission history
From: Richard Y. Chen [view email][v1] Mon, 5 Jun 2017 19:01:26 GMT (2158kb)
[v2] Sun, 11 Jun 2017 18:54:53 GMT (2158kb)
[v3] Tue, 7 Nov 2017 20:45:59 GMT (3079kb)
Link back to: arXiv, form interface, contact.