The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution

Chan, Hock Peng

Full-text links:

Download:

Current browse context:

math.ST

< prev | next >

new | recent | 1703

Mathematics > Statistics Theory

Title: The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution

Authors: Hock Peng Chan

(Submitted on 24 Mar 2017 (v1), last revised 16 Jan 2019 (this version, v4))

Abstract: Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. In recent years there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Non-parametric arm allocation procedures like $\epsilon$-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under non-parametric settings. However unlike UCB these non-parametric procedures are not efficient under general parametric settings. In this paper we propose efficient non-parametric procedures.

Subjects:	Statistics Theory (math.ST)
Cite as:	arXiv:1703.08285 [math.ST]
	(or arXiv:1703.08285v4 [math.ST] for this version)

Submission history

From: Hock Peng Chan [view email]
[v1] Fri, 24 Mar 2017 04:51:03 GMT (16kb)
[v2] Mon, 3 Apr 2017 06:22:40 GMT (17kb)
[v3] Thu, 28 Sep 2017 05:47:32 GMT (23kb)
[v4] Wed, 16 Jan 2019 05:14:10 GMT (28kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> math > arXiv:1703.08285

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Mathematics > Statistics Theory

Title: The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution

Submission history