Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications
(Submitted on 5 Mar 2017 (v1), last revised 8 Jun 2021 (this version, v5))
Abstract: We study combinatorial multi-armed bandit with probabilistically triggered arms (CMAB-T) and semi-bandit feedback. We resolve a serious issue in the prior CMAB-T studies where the regret bounds contain a possibly exponentially large factor of $1/p^*$, where $p^*$ is the minimum positive probability that an arm is triggered by any action. We address this issue by introducing a triggering probability modulated (TPM) bounded smoothness condition into the general CMAB-T framework, and show that many applications such as influence maximization bandit and combinatorial cascading bandit satisfy this TPM condition. As a result, we completely remove the factor of $1/p^*$ from the regret bounds, achieving significantly better regret bounds for influence maximization and cascading bandits than before. Finally, we provide lower bound results showing that the factor $1/p^*$ is unavoidable for general CMAB-T problems, suggesting that the TPM condition is crucial in removing this factor.
Submission history
From: Wei Chen [view email][v1] Sun, 5 Mar 2017 15:31:35 GMT (72kb)
[v2] Thu, 12 Oct 2017 08:25:41 GMT (70kb)
[v3] Sun, 5 Nov 2017 05:50:04 GMT (72kb)
[v4] Wed, 21 Feb 2018 19:21:09 GMT (74kb)
[v5] Tue, 8 Jun 2021 07:55:43 GMT (77kb)
Link back to: arXiv, form interface, contact.