Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Goenka, Ritesh; Gupta, Eashan; Khyalia, Sushil; Agarwal, Pratyush; Wajid, Mulinti Shaik; Kalyanakrishnan, Shivaram

Full-text links:

Download:

Current browse context:

cs.DM

< prev | next >

new | recent | 2211

Computer Science > Discrete Mathematics

Title: Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Authors: Ritesh Goenka, Eashan Gupta, Sushil Khyalia, Pratyush Agarwal, Mulinti Shaik Wajid, Shivaram Kalyanakrishnan

(Submitted on 28 Nov 2022 (v1), last revised 8 Oct 2023 (this version, v2))

Abstract: Policy Iteration (PI) is a widely used family of algorithms to compute optimal policies for Markov Decision Problems (MDPs). We derive upper bounds on the running time of PI on Deterministic MDPs (DMDPs): the class of MDPs in which every state-action pair has a unique next state. Our results include a non-trivial upper bound that applies to the entire family of PI algorithms; another to all "max-gain" switching variants; and affirmation that a conjecture regarding Howard's PI on MDPs is true for DMDPs. Our analysis is based on certain graph-theoretic results, which may be of independent interest.

Comments:	Added new bounds for two state MDPs
Subjects:	Discrete Mathematics (cs.DM); Computational Complexity (cs.CC); Combinatorics (math.CO)
MSC classes:	90C40 (Primary) 68Q25, 05C35, 05C38 (Secondary)
Cite as:	arXiv:2211.15602 [cs.DM]
	(or arXiv:2211.15602v2 [cs.DM] for this version)

Submission history

From: Ritesh Goenka [view email]
[v1] Mon, 28 Nov 2022 17:56:30 GMT (31kb)
[v2] Sun, 8 Oct 2023 20:19:31 GMT (51kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2211.15602

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Discrete Mathematics

Title: Upper Bounds for All and Max-gain Policy Iteration Algorithms on Deterministic MDPs

Submission history