Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Tu, Stephen; Recht, Benjamin

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1712

Computer Science > Machine Learning

Title: Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Authors: Stephen Tu, Benjamin Recht

(Submitted on 22 Dec 2017)

Abstract: Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within $\varepsilon$-relative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fast-mixing stochastic process concentrates above zero, extending a result by Koltchinskii and Mendelson in the independent covariates setting. Finally, we provide experimental evidence indicating that our analysis correctly captures the qualitative behavior of LSTD on several LQR instances.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1712.08642 [cs.LG]
	(or arXiv:1712.08642v1 [cs.LG] for this version)

Submission history

From: Stephen Tu [view email]
[v1] Fri, 22 Dec 2017 20:12:07 GMT (66kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1712.08642

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator

Submission history