Current browse context:
math.OC
Change to browse by:
References & Citations
Mathematics > Optimization and Control
Title: Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework
(Submitted on 27 Jun 2020 (v1), revised 10 Nov 2020 (this version, v2), latest version 17 Jun 2022 (v4))
Abstract: In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes $N$. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of $\tilde{O}(\sqrt{N})$. The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.
Submission history
From: Anran Hu [view email][v1] Sat, 27 Jun 2020 08:14:59 GMT (25kb)
[v2] Tue, 10 Nov 2020 07:14:07 GMT (29kb)
[v3] Mon, 17 May 2021 20:00:45 GMT (30kb)
[v4] Fri, 17 Jun 2022 18:48:26 GMT (45kb)
Link back to: arXiv, form interface, contact.