Adaptation to the Range in $K$-Armed Bandits

Hadiji, Hédi; Stoltz, Gilles

Full-text links:

Download:

Current browse context:

math.ST

< prev | next >

new | recent | 2006

Mathematics > Statistics Theory

Title: Adaptation to the Range in $K$-Armed Bandits

Authors: Hédi Hadiji, Gilles Stoltz

(Submitted on 5 Jun 2020 (v1), last revised 15 Jun 2022 (this version, v3))

Abstract: We consider stochastic bandit problems with $K$ arms, each associated with a bounded distribution supported on the range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and $\sqrt{T}$ bounds. For instance, a $\sqrt{T}$}distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order $\sqrt{T}$. We exhibit a strategy achieving the rates for regret indicated by the new trade-off.

Subjects:	Statistics Theory (math.ST); Machine Learning (stat.ML)
Cite as:	arXiv:2006.03378 [math.ST]
	(or arXiv:2006.03378v3 [math.ST] for this version)

Submission history

From: Gilles Stoltz [view email]
[v1] Fri, 5 Jun 2020 11:26:35 GMT (1460kb,D)
[v2] Thu, 12 Nov 2020 08:56:39 GMT (1927kb,D)
[v3] Wed, 15 Jun 2022 10:34:03 GMT (827kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> math > arXiv:2006.03378

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Mathematics > Statistics Theory

Title: Adaptation to the Range in $K$-Armed Bandits

Submission history