Current browse context:
cs.GT
Change to browse by:
References & Citations
Computer Science > Computer Science and Game Theory
Title: Coordination without communication: optimal regret in two players multi-armed bandits
(Submitted on 14 Feb 2020 (v1), last revised 9 Jul 2020 (this version, v2))
Abstract: We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.
Submission history
From: Thomas Budzinski [view email][v1] Fri, 14 Feb 2020 17:35:42 GMT (29kb)
[v2] Thu, 9 Jul 2020 19:11:02 GMT (29kb)
Link back to: arXiv, form interface, contact.