We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Mathematics > Optimization and Control

Title: A model-free first-order method for linear quadratic regulator with $\tilde{O}(1/\varepsilon)$ sampling complexity

Abstract: We consider the classic stochastic linear quadratic regulator (LQR) problem under an infinite horizon average stage cost. By leveraging recent policy gradient methods from reinforcement learning, we obtain a first-order method that finds a stabilizing matrix gain whose objective function gap is at most $\varepsilon$ with high probability using $\tilde{O}(1/\varepsilon)$ samples, where $\tilde{O}$ hides polylogarithmic dependence on $\varepsilon$. Our proposed method seems to have the best dependence on $\varepsilon$ within the model-free literature and matches the best known rate from the model-based literature, up to logarithmic factors. Our developments that result in this improved sampling complexity fall in the category of actor-critic algorithms. The actor part involves a variational inequality formulation of the stochastic LQR problem and ensuing application of a projected operator method, while in the critic part we utilize a conditional stochastic primal-dual method and show that the algorithm has the optimal rate of convergence when paired with a shrinking multi-epoch scheme.
Comments: Pre-print. 23 pages, 1 figure. Comments are welcome
Subjects: Optimization and Control (math.OC)
MSC classes: 93C05, 65K05
Cite as: arXiv:2212.00084 [math.OC]
  (or arXiv:2212.00084v1 [math.OC] for this version)

Submission history

From: Caleb Ju [view email]
[v1] Wed, 30 Nov 2022 19:54:32 GMT (178kb,D)

Link back to: arXiv, form interface, contact.