We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Machine Learning

Title: Beyond Target Networks: Improving Deep $Q$-learning with Functional Regularization

Abstract: Much of the recent successes in Deep Reinforcement Learning have been based on minimizing the squared Bellman error. However, training is often unstable due to fast-changing target Q-values, and target networks are employed to regularize the Q-value estimation and stabilize training by using an additional set of lagging parameters. Despite their advantages, target networks are potentially an inflexible way to regularize Q-values which may ultimately slow down training. In this work, we address this issue by augmenting the squared Bellman error with a functional regularizer. Unlike target networks, the regularization we propose here is explicit and enables us to use up-to-date parameters as well as control the regularization. This leads to a faster yet more stable training method. We analyze the convergence of our method theoretically and empirically validate our predictions on simple environments as well as on a suite of Atari environments. We demonstrate empirical improvements over target network based methods in terms of both sample efficiency and performance. In summary, our approach provides a fast and stable alternative to replace the standard squared Bellman error
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:2106.02613 [stat.ML]
  (or arXiv:2106.02613v3 [stat.ML] for this version)

Submission history

From: Alexandre Piché [view email]
[v1] Fri, 4 Jun 2021 17:21:07 GMT (5678kb,D)
[v2] Mon, 7 Jun 2021 20:23:18 GMT (5679kb,D)
[v3] Tue, 1 Feb 2022 20:26:11 GMT (8937kb,D)

Link back to: arXiv, form interface, contact.