References & Citations
Mathematics > Statistics Theory
Title: The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution
(Submitted on 24 Mar 2017 (v1), last revised 16 Jan 2019 (this version, v4))
Abstract: Lai and Robbins (1985) and Lai (1987) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback-Leibler information of the reward distributions, estimated from specified parametric families. In recent years there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Non-parametric arm allocation procedures like $\epsilon$-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under non-parametric settings. However unlike UCB these non-parametric procedures are not efficient under general parametric settings. In this paper we propose efficient non-parametric procedures.
Submission history
From: Hock Peng Chan [view email][v1] Fri, 24 Mar 2017 04:51:03 GMT (16kb)
[v2] Mon, 3 Apr 2017 06:22:40 GMT (17kb)
[v3] Thu, 28 Sep 2017 05:47:32 GMT (23kb)
[v4] Wed, 16 Jan 2019 05:14:10 GMT (28kb)
Link back to: arXiv, form interface, contact.