Bootstrapped Thompson Sampling and Deep Exploration

Osband, Ian; Van Roy, Benjamin

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1507

Statistics > Machine Learning

Title: Bootstrapped Thompson Sampling and Deep Exploration

Authors: Ian Osband, Benjamin Van Roy

(Submitted on 1 Jul 2015)

Abstract: This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions. The approach is based on a bootstrap technique that uses a combination of observed and artificially generated data. The latter serves to induce a prior distribution which, as we will demonstrate, is critical to effective exploration. We explain how the approach can be applied to multi-armed bandit and reinforcement learning problems and how it relates to Thompson sampling. The approach is particularly well-suited for contexts in which exploration is coupled with deep learning, since in these settings, maintaining or generating samples from a posterior distribution becomes computationally infeasible.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1507.00300 [stat.ML]
	(or arXiv:1507.00300v1 [stat.ML] for this version)

Submission history

From: Ian Osband [view email]
[v1] Wed, 1 Jul 2015 17:47:01 GMT (65kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1507.00300

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Bootstrapped Thompson Sampling and Deep Exploration

Submission history