Adaptive Lambda Least-Squares Temporal Difference Learning

Mann, Timothy A.; Penedones, Hugo; Mannor, Shie; Hester, Todd

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1612

Computer Science > Machine Learning

Title: Adaptive Lambda Least-Squares Temporal Difference Learning

Authors: Timothy A. Mann, Hugo Penedones, Shie Mannor, Todd Hester

(Submitted on 30 Dec 2016)

Abstract: Temporal Difference learning or TD($\lambda$) is a fundamental algorithm in the field of reinforcement learning. However, setting TD's $\lambda$ parameter, which controls the timescale of TD updates, is generally left up to the practitioner. We formalize the $\lambda$ selection problem as a bias-variance trade-off where the solution is the value of $\lambda$ that leads to the smallest Mean Squared Value Error (MSVE). To solve this trade-off we suggest applying Leave-One-Trajectory-Out Cross-Validation (LOTO-CV) to search the space of $\lambda$ values. Unfortunately, this approach is too computationally expensive for most practical applications. For Least Squares TD (LSTD) we show that LOTO-CV can be implemented efficiently to automatically tune $\lambda$ and apply function optimization methods to efficiently search the space of $\lambda$ values. The resulting algorithm, ALLSTD, is parameter free and our experiments demonstrate that ALLSTD is significantly computationally faster than the na\"{i}ve LOTO-CV implementation while achieving similar performance.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1612.09465 [cs.LG]
	(or arXiv:1612.09465v1 [cs.LG] for this version)

Submission history

From: Hugo Penedones [view email]
[v1] Fri, 30 Dec 2016 11:51:14 GMT (170kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1612.09465

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Adaptive Lambda Least-Squares Temporal Difference Learning

Submission history