Robustness of Anytime Bandit Policies

Salomon, Antoine; Audibert, Jean-Yves

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 1107

Statistics > Machine Learning

Title: Robustness of Anytime Bandit Policies

Authors: Antoine Salomon, Jean-Yves Audibert

(Submitted on 22 Jul 2011 (v1), last revised 25 Jul 2011 (this version, v2))

Abstract: This paper studies the deviations of the regret in a stochastic multi-armed bandit problem. When the total number of plays n is known beforehand by the agent, Audibert et al. (2009) exhibit a policy such that with probability at least 1-1/n, the regret of the policy is of order log(n). They have also shown that such a property is not shared by the popular ucb1 policy of Auer et al. (2002). This work first answers an open question: it extends this negative result to any anytime policy. The second contribution of this paper is to design anytime robust policies for specific multi-armed bandit problems in which some restrictions are put on the set of possible distributions of the different arms.

Subjects:	Machine Learning (stat.ML)
Cite as:	arXiv:1107.4506 [stat.ML]
	(or arXiv:1107.4506v2 [stat.ML] for this version)

Submission history

From: Antoine Salomon [view email]
[v1] Fri, 22 Jul 2011 12:55:34 GMT (6191kb,D)
[v2] Mon, 25 Jul 2011 12:52:17 GMT (6191kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:1107.4506

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Robustness of Anytime Bandit Policies

Submission history