Coordination without communication: optimal regret in two players multi-armed bandits

Bubeck, Sébastien; Budzinski, Thomas

Full-text links:

Download:

Current browse context:

cs.GT

< prev | next >

new | recent | 2002

Computer Science > Computer Science and Game Theory

Title: Coordination without communication: optimal regret in two players multi-armed bandits

Authors: Sébastien Bubeck, Thomas Budzinski

(Submitted on 14 Feb 2020 (v1), last revised 9 Jul 2020 (this version, v2))

Abstract: We consider two agents playing simultaneously the same stochastic three-armed bandit problem. The two agents are cooperating but they cannot communicate. We propose a strategy with no collisions at all between the players (with very high probability), and with near-optimal regret $O(\sqrt{T \log(T)})$. We also argue that the extra logarithmic term $\sqrt{\log(T)}$ should be necessary by proving a lower bound for a full information variant of the problem.

Comments:	28 pages, 5 figures. V2: minor revision
Subjects:	Computer Science and Game Theory (cs.GT); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Journal reference:	COLT 2020
Cite as:	arXiv:2002.07596 [cs.GT]
	(or arXiv:2002.07596v2 [cs.GT] for this version)

Submission history

From: Thomas Budzinski [view email]
[v1] Fri, 14 Feb 2020 17:35:42 GMT (29kb)
[v2] Thu, 9 Jul 2020 19:11:02 GMT (29kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2002.07596

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Computer Science and Game Theory

Title: Coordination without communication: optimal regret in two players multi-armed bandits

Submission history