Bandit Learning with Positive Externalities

Shah, Virag; Blanchet, Jose; Johari, Ramesh

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1802

Computer Science > Machine Learning

Title: Bandit Learning with Positive Externalities

Authors: Virag Shah, Jose Blanchet, Ramesh Johari

(Submitted on 15 Feb 2018 (v1), last revised 6 Mar 2019 (this version, v5))

Abstract: In many platforms, user arrivals exhibit a self-reinforcing behavior: future user arrivals are likely to have preferences similar to users who were satisfied in the past. In other words, arrivals exhibit positive externalities. We study multiarmed bandit (MAB) problems with positive externalities. We show that the self-reinforcing preferences may lead standard benchmark algorithms such as UCB to exhibit linear regret. We develop a new algorithm, Balanced Exploration (BE), which explores arms carefully to avoid suboptimal convergence of arrivals before sufficient evidence is gathered. We also introduce an adaptive variant of BE which successively eliminates suboptimal arms. We analyze their asymptotic regret, and establish optimality by showing that no algorithm can perform better.

Comments:	31 pages, 1 table, 2 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1802.05693 [cs.LG]
	(or arXiv:1802.05693v5 [cs.LG] for this version)

Submission history

From: Virag Shah [view email]
[v1] Thu, 15 Feb 2018 18:22:06 GMT (284kb)
[v2] Sat, 21 Apr 2018 23:57:43 GMT (279kb)
[v3] Sat, 2 Jun 2018 21:35:16 GMT (407kb)
[v4] Fri, 26 Oct 2018 23:34:54 GMT (467kb,D)
[v5] Wed, 6 Mar 2019 20:56:15 GMT (459kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1802.05693

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Bandit Learning with Positive Externalities

Submission history