Current browse context:
cs.AI
Change to browse by:
References & Citations
Computer Science > Artificial Intelligence
Title: An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits
(Submitted on 8 Oct 2017 (v1), last revised 3 Mar 2018 (this version, v2))
Abstract: In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to an optimal regret that is logarithmic with respect to the number of episodes.
Submission history
From: Isaac Sledge [view email][v1] Sun, 8 Oct 2017 18:48:48 GMT (4984kb,D)
[v2] Sat, 3 Mar 2018 21:01:57 GMT (2679kb,D)
Link back to: arXiv, form interface, contact.