We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Model-Free Algorithm and Regret Analysis for MDPs with Long-Term Constraints

Abstract: In the optimization of dynamical systems, the variables typically have constraints. Such problems can be modeled as a constrained Markov Decision Process (CMDP). This paper considers a model-free approach to the problem, where the transition probabilities are not known. In the presence of long-term (or average) constraints, the agent has to choose a policy that maximizes the long-term average reward as well as satisfy the average constraints in each episode. The key challenge with the long-term constraints is that the optimal policy is not deterministic in general, and thus standard Q-learning approaches cannot be directly used. This paper uses concepts from constrained optimization and Q-learning to propose an algorithm for CMDP with long-term constraints. For any $\gamma\in(0,\frac{1}{2})$, the proposed algorithm is shown to achieve $O(T^{1/2+\gamma})$ regret bound for the obtained reward and $O(T^{1-\gamma/2})$ regret bound for the constraint violation, where $T$ is the total number of steps. We note that these are the first results on regret analysis for MDP with long-term constraints, where the transition probabilities are not known apriori.
Comments: The result has error
Subjects: Machine Learning (cs.LG); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as: arXiv:2006.05961 [cs.LG]
  (or arXiv:2006.05961v2 [cs.LG] for this version)

Submission history

From: Vaneet Aggarwal [view email]
[v1] Wed, 10 Jun 2020 17:19:29 GMT (103kb)
[v2] Sat, 30 Jan 2021 21:06:30 GMT (0kb,I)

Link back to: arXiv, form interface, contact.