An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Sledge, Isaac J.; Principe, Jose C.

doi:10.3390/e20030155

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 1710

Computer Science > Artificial Intelligence

Title: An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Authors: Isaac J. Sledge, Jose C. Principe

(Submitted on 8 Oct 2017 (v1), last revised 3 Mar 2018 (this version, v2))

Abstract: In this paper, we propose an information-theoretic exploration strategy for stochastic, discrete multi-armed bandits that achieves optimal regret. Our strategy is based on the value of information criterion. This criterion measures the trade-off between policy information and obtainable rewards. High amounts of policy information are associated with exploration-dominant searches of the space and yield high rewards. Low amounts of policy information favor the exploitation of existing knowledge. Information, in this criterion, is quantified by a parameter that can be varied during search. We demonstrate that a simulated-annealing-like update of this parameter, with a sufficiently fast cooling schedule, leads to an optimal regret that is logarithmic with respect to the number of episodes.

Comments:	Entropy
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
DOI:	10.3390/e20030155
Cite as:	arXiv:1710.02869 [cs.AI]
	(or arXiv:1710.02869v2 [cs.AI] for this version)

Submission history

From: Isaac Sledge [view email]
[v1] Sun, 8 Oct 2017 18:48:48 GMT (4984kb,D)
[v2] Sat, 3 Mar 2018 21:01:57 GMT (2679kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1710.02869

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Artificial Intelligence

Title: An Analysis of the Value of Information when Exploring Stochastic, Discrete Multi-Armed Bandits

Submission history