References & Citations
Mathematics > Optimization and Control
Title: Bandit Online Learning in Pseudo-Monotone Games with Multi-Point Pseudo-Gradient Estimate
(Submitted on 29 Mar 2023 (v1), last revised 30 Mar 2023 (this version, v2))
Abstract: Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to drug delivery. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this paper, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further demonstrate that the generated actual sequence of play can converge a.s. to a critical point if the game under study is merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. Finally, we illustrate the validity of the proposed algorithm via a Rock-Paper-Scissors game and a least square estimation game.
Submission history
From: Yuanhanqing Huang [view email][v1] Wed, 29 Mar 2023 03:24:52 GMT (904kb,D)
[v2] Thu, 30 Mar 2023 04:01:03 GMT (904kb,D)
Link back to: arXiv, form interface, contact.