Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

Lin, Tianyi; Zhou, Zhengyuan; Ba, Wenjia; Zhang, Jiawei

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2112

Computer Science > Machine Learning

Title: Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

Authors: Tianyi Lin, Zhengyuan Zhou, Wenjia Ba, Jiawei Zhang

(Submitted on 6 Dec 2021 (this version), latest version 29 Mar 2024 (v4))

Abstract: We consider online no-regret learning in unknown games with bandit feedback, where each agent only observes its reward at each time -- determined by all players' current joint action -- rather than its gradient. We focus on the class of smooth and strongly monotone games and study optimal no-regret learning therein. Leveraging self-concordant barrier functions, we first construct an online bandit convex optimization algorithm and show that it achieves the single-agent optimal regret of $\tilde{\Theta}(\sqrt{T})$ under smooth and strongly-concave payoff functions. We then show that if each agent applies this no-regret learning algorithm in strongly monotone games, the joint action converges in \textit{last iterate} to the unique Nash equilibrium at a rate of $\tilde{\Theta}(1/\sqrt{T})$. Prior to our work, the best-know convergence rate in the same class of games is $O(1/T^{1/3})$ (achieved by a different algorithm), thus leaving open the problem of optimal no-regret learning algorithms (since the known lower bound is $\Omega(1/\sqrt{T})$). Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning by identifying the first doubly optimal bandit learning algorithm, in that it achieves (up to log factors) both optimal regret in the single-agent learning and optimal last-iterate convergence rate in the multi-agent learning. We also present results on several simulation studies -- Cournot competition, Kelly auctions, and distributed regularized logistic regression -- to demonstrate the efficacy of our algorithm.

Comments:	40 pages, 3 figures
Subjects:	Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Optimization and Control (math.OC)
Cite as:	arXiv:2112.02856 [cs.LG]
	(or arXiv:2112.02856v1 [cs.LG] for this version)

Submission history

From: Wenjia Ba [view email]
[v1] Mon, 6 Dec 2021 08:27:54 GMT (408kb)
[v2] Wed, 8 Dec 2021 02:06:50 GMT (366kb)
[v3] Sun, 10 Jul 2022 01:29:19 GMT (1045kb)
[v4] Fri, 29 Mar 2024 04:18:14 GMT (444kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2112.02856v1

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Optimal No-Regret Learning in Strongly Monotone Games with Bandit Feedback

Submission history