We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: Kiefer Wolfowitz Algorithm is Asymptotically Optimal for a Class of Non-Stationary Bandit Problems

Abstract: We consider the problem of designing an allocation rule or an "online learning algorithm" for a class of bandit problems in which the set of control actions available at each time $s$ is a convex, compact subset of $\mathbb{R}^d$. Upon choosing an action $x$ at time $s$, the algorithm obtains a noisy value of the unknown and time-varying function $f_s$ evaluated at $x$. The "regret" of an algorithm is the gap between its expected reward, and the reward earned by a strategy which has the knowledge of the function $f_s$ at each time $s$ and hence chooses the action $x_s$ that maximizes $f_s$.
For this non-stationary bandit problem set-up, we consider two variants of the Kiefer Wolfowitz (KW) algorithm i) KW with fixed step-size $\beta$, and ii) KW with sliding window of length $L$. We show that if the number of times that the function $f_s$ varies during time $T$ is $o(T)$, and if the learning rates of the proposed algorithms are chosen "optimally", then the regret of the proposed algorithms is $o(T)$, and hence the algorithms are asymptotically efficient.
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:1702.08000 [stat.ML]
  (or arXiv:1702.08000v2 [stat.ML] for this version)

Submission history

From: Rahul Singh [view email]
[v1] Sun, 26 Feb 2017 08:24:11 GMT (66kb,D)
[v2] Wed, 8 Mar 2017 08:38:08 GMT (70kb,D)

Link back to: arXiv, form interface, contact.