Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

Chen, Liyu; Jain, Rahul; Luo, Haipeng

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2202

Change to browse by:

Computer Science > Machine Learning

Title: Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

Authors: Liyu Chen, Rahul Jain, Haipeng Luo

(Submitted on 31 Jan 2022)

Abstract: We study regret minimization for infinite-horizon average-reward Markov Decision Processes (MDPs) under cost constraints. We start by designing a policy optimization algorithm with carefully designed action-value estimator and bonus term, and show that for ergodic MDPs, our algorithm ensures $\widetilde{O}(\sqrt{T})$ regret and constant constraint violation, where $T$ is the total number of time steps. This strictly improves over the algorithm of (Singh et al., 2020), whose regret and constraint violation are both $\widetilde{O}(T^{2/3})$. Next, we consider the most general class of weakly communicating MDPs. Through a finite-horizon approximation, we develop another algorithm with $\widetilde{O}(T^{2/3})$ regret and constraint violation, which can be further improved to $\widetilde{O}(\sqrt{T})$ via a simple modification, albeit making the algorithm computationally inefficient. As far as we know, these are the first set of provable algorithms for weakly communicating MDPs with cost constraints.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2202.00150 [cs.LG]
	(or arXiv:2202.00150v1 [cs.LG] for this version)

Submission history

From: Liyu Chen [view email]
[v1] Mon, 31 Jan 2022 23:52:34 GMT (74kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2202.00150

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Learning Infinite-Horizon Average-Reward Markov Decision Processes with Constraints

Submission history