Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: A Model-free Learning Algorithm for Infinite-horizon Average-reward MDPs with Near-optimal Regret
(Submitted on 8 Jun 2020 (v1), last revised 9 Dec 2020 (this version, v2))
Abstract: Recently, model-free reinforcement learning has attracted research attention due to its simplicity, memory and computation efficiency, and the flexibility to combine with function approximation. In this paper, we propose Exploration Enhanced Q-learning (EE-QL), a model-free algorithm for infinite-horizon average-reward Markov Decision Processes (MDPs) that achieves regret bound of $O(\sqrt{T})$ for the general class of weakly communicating MDPs, where $T$ is the number of interactions. EE-QL assumes that an online concentrating approximation of the optimal average reward is available. This is the first model-free learning algorithm that achieves $O(\sqrt T)$ regret without the ergodic assumption, and matches the lower bound in terms of $T$ except for logarithmic factors. Experiments show that the proposed algorithm performs as well as the best known model-based algorithms.
Submission history
From: Mehdi Jafarnia-Jahromi [view email][v1] Mon, 8 Jun 2020 05:09:32 GMT (965kb,D)
[v2] Wed, 9 Dec 2020 00:05:18 GMT (0kb,I)
Link back to: arXiv, form interface, contact.