Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits
(Submitted on 12 Sep 2017 (v1), last revised 30 Nov 2018 (this version, v2))
Abstract: In this paper, we propose and study opportunistic bandits - a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load/price is low and exploit more when the load/price is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (AdaUCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that AdaUCB achieves $O(\log T)$ regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, AdaUCB achieves $O(1)$ regret with respect to $T$ if the exploration cost is zero when the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that AdaUCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load/price fluctuations.
Submission history
From: Xueying Guo [view email][v1] Tue, 12 Sep 2017 18:18:33 GMT (239kb,D)
[v2] Fri, 30 Nov 2018 18:38:12 GMT (1273kb,D)
Link back to: arXiv, form interface, contact.