We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.ST

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Statistics Theory

Title: Adaptation to the Range in $K$-Armed Bandits

Abstract: We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and $\sqrt{T}$ bounds. For instance, a $\sqrt{T}$}distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order $\sqrt{T}$. We exhibit a strategy achieving the rates for regret indicated by the new trade-off.
Subjects: Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as: arXiv:2006.03378 [math.ST]
  (or arXiv:2006.03378v3 [math.ST] for this version)

Submission history

From: Gilles Stoltz [view email]
[v1] Fri, 5 Jun 2020 11:26:35 GMT (1460kb,D)
[v2] Thu, 12 Nov 2020 08:56:39 GMT (1927kb,D)
[v3] Wed, 15 Jun 2022 10:34:03 GMT (827kb,D)

Link back to: arXiv, form interface, contact.