Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Kozuno, Tadashi; Ménard, Pierre; Munos, Rémi; Valko, Michal

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2106

Statistics > Machine Learning

Title: Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Authors: Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko

(Submitted on 11 Jun 2021)

Abstract: We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamic of the IIG is not known -- we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order $1/\sqrt{T}$ where $T$ is the number of played games. Moreover, IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.

Comments:	20 pages
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2106.06279 [stat.ML]
	(or arXiv:2106.06279v1 [stat.ML] for this version)

Submission history

From: Pierre Menard [view email]
[v1] Fri, 11 Jun 2021 09:51:29 GMT (575kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2106.06279

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

Submission history