We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.AI

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Artificial Intelligence

Title: An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Abstract: In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to an optimal regret that is logarithmic with respect to the number of episodes.
Comments: Entropy
Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
DOI: 10.3390/e20030155
Cite as: arXiv:1710.02869 [cs.AI]
  (or arXiv:1710.02869v2 [cs.AI] for this version)

Submission history

From: Isaac Sledge [view email]
[v1] Sun, 8 Oct 2017 18:48:48 GMT (4984kb,D)
[v2] Sat, 3 Mar 2018 21:01:57 GMT (2679kb,D)

Link back to: arXiv, form interface, contact.