Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Basei, Matteo; Guo, Xin; Hu, Anran

Full-text links:

Download:

Current browse context:

math.OC

< prev | next >

new | recent | 2006

Mathematics > Optimization and Control

Title: Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Authors: Matteo Basei, Xin Guo, Anran Hu

(Submitted on 27 Jun 2020 (v1), revised 10 Nov 2020 (this version, v2), latest version 17 Jun 2022 (v4))

Abstract: In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes $N$. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of $\tilde{O}(\sqrt{N})$. The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.

Comments:	25 pages
Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.15316 [math.OC]
	(or arXiv:2006.15316v2 [math.OC] for this version)

Submission history

From: Anran Hu [view email]
[v1] Sat, 27 Jun 2020 08:14:59 GMT (25kb)
[v2] Tue, 10 Nov 2020 07:14:07 GMT (29kb)
[v3] Mon, 17 May 2021 20:00:45 GMT (30kb)
[v4] Fri, 17 Jun 2022 18:48:26 GMT (45kb)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> math > arXiv:2006.15316v2

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Mathematics > Optimization and Control

Title: Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Submission history