UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles

Chen, Richard Y.; Sidor, Szymon; Abbeel, Pieter; Schulman, John

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1706

Computer Science > Machine Learning

Title: UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles

Authors: Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman

(Submitted on 5 Jun 2017 (v1), revised 11 Jun 2017 (this version, v2), latest version 7 Nov 2017 (v3))

Abstract: We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. First we propose an exploration strategy based on upper-confidence bounds (UCB). Next, we define an "InfoGain" exploration bonus, which depends on the disagreement of the $Q$-ensemble. Our experiments show significant gains on the Atari benchmark.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1706.01502 [cs.LG]
	(or arXiv:1706.01502v2 [cs.LG] for this version)

Submission history

From: Richard Y. Chen [view email]
[v1] Mon, 5 Jun 2017 19:01:26 GMT (2158kb)
[v2] Sun, 11 Jun 2017 18:54:53 GMT (2158kb)
[v3] Tue, 7 Nov 2017 20:45:59 GMT (3079kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1706.01502v2

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: UCB and InfoGain Exploration via $\boldsymbol{Q}$-Ensembles

Submission history