We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Statistics Theory

Title: The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution

Abstract: Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. In recent years there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Non-parametric arm allocation procedures like $\epsilon$-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under non-parametric settings. However unlike UCB these non-parametric procedures are not efficient under general parametric settings. In this paper we propose efficient non-parametric procedures.
Subjects: Statistics Theory (math.ST)
Cite as: arXiv:1703.08285 [math.ST]
  (or arXiv:1703.08285v4 [math.ST] for this version)

Submission history

From: Hock Peng Chan [view email]
[v1] Fri, 24 Mar 2017 04:51:03 GMT (16kb)
[v2] Mon, 3 Apr 2017 06:22:40 GMT (17kb)
[v3] Thu, 28 Sep 2017 05:47:32 GMT (23kb)
[v4] Wed, 16 Jan 2019 05:14:10 GMT (28kb)

Link back to: arXiv, form interface, contact.