We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.DM

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Discrete Mathematics

Title: Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Abstract: Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms; another to all "max-gain" switching variants; and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.
Comments: Added new bounds for two state MDPs
Subjects: Discrete Mathematics (cs.DM); Computational Complexity (cs.CC); Combinatorics (math.CO)
MSC classes: 90C40 (Primary) 68Q25, 05C35, 05C38 (Secondary)
Cite as: arXiv:2211.15602 [cs.DM]
  (or arXiv:2211.15602v2 [cs.DM] for this version)

Submission history

From: Ritesh Goenka [view email]
[v1] Mon, 28 Nov 2022 17:56:30 GMT (31kb)
[v2] Sun, 8 Oct 2023 20:19:31 GMT (51kb)

Link back to: arXiv, form interface, contact.