Parameter-free Gradient Temporal Difference Learning

Jacobsen, Andrew; Chan, Alan

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2105

Computer Science > Machine Learning

Title: Parameter-free Gradient Temporal Difference Learning

Authors: Andrew Jacobsen, Alan Chan

(Submitted on 10 May 2021)

Abstract: Reinforcement learning lies at the intersection of several challenges. Many applications of interest involve extremely large state spaces, requiring function approximation to enable tractable computation. In addition, the learner has only a single stream of experience with which to evaluate a large number of possible courses of action, necessitating algorithms which can learn off-policy. However, the combination of off-policy learning with function approximation leads to divergence of temporal difference methods. Recent work into gradient-based temporal difference methods has promised a path to stability, but at the cost of expensive hyperparameter tuning. In parallel, progress in online learning has provided parameter-free methods that achieve minimax optimal guarantees up to logarithmic terms, but their application in reinforcement learning has yet to be explored. In this work, we combine these two lines of attack, deriving parameter-free, gradient-based temporal difference algorithms. Our algorithms run in linear time and achieve high-probability convergence guarantees matching those of GTD2 up to $\log$ factors. Our experiments demonstrate that our methods maintain high prediction performance relative to fully-tuned baselines, with no tuning whatsoever.

Comments:	30 pages, 10 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2105.04129 [cs.LG]
	(or arXiv:2105.04129v1 [cs.LG] for this version)

Submission history

From: Andrew Jacobsen [view email]
[v1] Mon, 10 May 2021 06:07:05 GMT (6273kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2105.04129

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Parameter-free Gradient Temporal Difference Learning

Submission history