UCB Exploration via Q-Ensembles

Chen, Richard Y.; Sidor, Szymon; Abbeel, Pieter; Schulman, John

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1706

Computer Science > Machine Learning

Title: UCB Exploration via Q-Ensembles

Authors: Richard Y. Chen, Szymon Sidor, Pieter Abbeel, John Schulman

(Submitted on 5 Jun 2017 (v1), last revised 7 Nov 2017 (this version, v3))

Abstract: We show how an ensemble of $Q^*$-functions can be leveraged for more effective exploration in deep reinforcement learning. We build on well established algorithms from the bandit setting, and adapt them to the $Q$-learning setting. We propose an exploration strategy based on upper-confidence bounds (UCB). Our experiments show significant gains on the Atari benchmark.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1706.01502 [cs.LG]
	(or arXiv:1706.01502v3 [cs.LG] for this version)

Submission history

From: Richard Y. Chen [view email]
[v1] Mon, 5 Jun 2017 19:01:26 GMT (2158kb)
[v2] Sun, 11 Jun 2017 18:54:53 GMT (2158kb)
[v3] Tue, 7 Nov 2017 20:45:59 GMT (3079kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1706.01502

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: UCB Exploration via Q-Ensembles

Submission history