Current browse context:
math.ST
Change to browse by:
References & Citations
Mathematics > Statistics Theory
Title: Adaptation to the Range in $K$-Armed Bandits
(Submitted on 5 Jun 2020 (v1), last revised 15 Jun 2022 (this version, v3))
Abstract: We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and $\sqrt{T}$ bounds. For instance, a $\sqrt{T}$}distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order $\sqrt{T}$. We exhibit a strategy achieving the rates for regret indicated by the new trade-off.
Submission history
From: Gilles Stoltz [view email][v1] Fri, 5 Jun 2020 11:26:35 GMT (1460kb,D)
[v2] Thu, 12 Nov 2020 08:56:39 GMT (1927kb,D)
[v3] Wed, 15 Jun 2022 10:34:03 GMT (827kb,D)
Link back to: arXiv, form interface, contact.