We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Optimization and Control

Title: Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Abstract: In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes $N$. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of $\tilde{O}(\sqrt{N})$. The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.
Comments: 25 pages
Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:2006.15316 [math.OC]
  (or arXiv:2006.15316v2 [math.OC] for this version)

Submission history

From: Anran Hu [view email]
[v1] Sat, 27 Jun 2020 08:14:59 GMT (25kb)
[v2] Tue, 10 Nov 2020 07:14:07 GMT (29kb)
[v3] Mon, 17 May 2021 20:00:45 GMT (30kb)
[v4] Fri, 17 Jun 2022 18:48:26 GMT (45kb)

Link back to: arXiv, form interface, contact.