### Current browse context:

math

### Change to browse by:

### References & Citations

# Computer Science > Machine Learning

# Title: Rotting Infinitely Many-armed Bandits

(Submitted on 31 Jan 2022 (v1), last revised 13 Jul 2022 (this version, v2))

Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $\Omega(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.

## Submission history

From: Jung-Hun Kim [view email]**[v1]**Mon, 31 Jan 2022 03:07:17 GMT (643kb,D)

**[v2]**Wed, 13 Jul 2022 04:36:54 GMT (1835kb,D)

Link back to: arXiv, form interface, contact.