Current browse context:
cs.DM
Change to browse by:
References & Citations
Computer Science > Discrete Mathematics
Title: Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs
(Submitted on 28 Nov 2022 (v1), last revised 8 Oct 2023 (this version, v2))
Abstract: Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms; another to all "max-gain" switching variants; and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.
Submission history
From: Ritesh Goenka [view email][v1] Mon, 28 Nov 2022 17:56:30 GMT (31kb)
[v2] Sun, 8 Oct 2023 20:19:31 GMT (51kb)
Link back to: arXiv, form interface, contact.