### Current browse context:

cs.DS

### Change to browse by:

### References & Citations

# Computer Science > Data Structures and Algorithms

# Title: Stochastic Multi-armed Bandits in Constant Space

(Submitted on 25 Dec 2017 (v1), last revised 16 May 2018 (this version, v2))

Abstract: We consider the stochastic bandit problem in the sublinear space setting, where one cannot record the win-loss record for all $K$ arms. We give an algorithm using $O(1)$ words of space with regret \[

\sum_{i=1}^{K}\frac{1}{\Delta_i}\log \frac{\Delta_i}{\Delta}\log T \] where $\Delta_i$ is the gap between the best arm and arm $i$ and $\Delta$ is the gap between the best and the second-best arms. If the rewards are bounded away from $0$ and $1$, this is within an $O(\log 1/\Delta)$ factor of the optimum regret possible without space constraints.

## Submission history

From: Ger Yang [view email]**[v1]**Mon, 25 Dec 2017 05:04:35 GMT (35kb)

**[v2]**Wed, 16 May 2018 17:06:53 GMT (28kb)

Link back to: arXiv, form interface, contact.