Current browse context:
stat.ML
Change to browse by:
References & Citations
Statistics > Machine Learning
Title: Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization
(Submitted on 4 Jun 2021 (v1), last revised 1 Feb 2022 (this version, v3))
Abstract: Much of the recent successes in Deep Reinforcement Learning have been based on minimizing the squared Bellman error. However, training is often unstable due to fast-changing target Q-values, and target networks are employed to regularize the Q-value estimation and stabilize training by using an additional set of lagging parameters. Despite their advantages, target networks are potentially an inflexible way to regularize Q-values which may ultimately slow down training. In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks, the regularization we propose here is explicit and enables us to use up-to-date parameters as well as control the regularization. This leads to a faster yet more stable training method. We analyze the convergence of our method theoretically and empirically validate our predictions on simple environments as well as on a suite of Atari environments. We demonstrate empirical improvements over target network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error
Submission history
From: Alexandre Piché [view email][v1] Fri, 4 Jun 2021 17:21:07 GMT (5678kb,D)
[v2] Mon, 7 Jun 2021 20:23:18 GMT (5679kb,D)
[v3] Tue, 1 Feb 2022 20:26:11 GMT (8937kb,D)
Link back to: arXiv, form interface, contact.