We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Optimization and Control

Title: The Geometry of Memoryless Stochastic Policy Optimization in Infinite-Horizon POMDPs

Abstract: We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we demonstrate how the partial observability constraints can lead to multiple smooth and non-smooth local optimizers and we estimate the number of critical points.
Comments: Preprint, 37 pages, 5 figures
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Algebraic Geometry (math.AG)
MSC classes: 90C40, 93E20, 49M37, 90C23
Cite as: arXiv:2110.07409 [math.OC]
  (or arXiv:2110.07409v2 [math.OC] for this version)

Submission history

From: Johannes Müller [view email]
[v1] Thu, 14 Oct 2021 14:42:09 GMT (219kb,D)
[v2] Fri, 15 Oct 2021 13:34:42 GMT (219kb,D)

Link back to: arXiv, form interface, contact.