We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Statistics > Machine Learning

Title: Sample Complexity of Offline Reinforcement Learning with Deep ReLU Networks

Abstract: Offline reinforcement learning (RL) leverages previously collected data for policy optimization without any further active exploration. Despite the recent interest in this problem, its theoretical results in neural network function approximation setting remain limited. In this paper, we study the statistical theory of offline RL with deep ReLU network function approximation. In particular, we establish the sample complexity of $\tilde{\mathcal{O}}\left( \kappa^{1 + d/\alpha} \cdot \epsilon^{-2 - 2d/\alpha} \right)$ for offline RL with deep ReLU networks, where $\kappa$ is a measure of distributional shift, $d$ is the dimension of the state-action space, $\alpha$ is a (possibly fractional) smoothness parameter of the underlying Markov decision process (MDP), and $\epsilon$ is a user-specified error. Notably, our sample complexity holds under two novel considerations, namely the Besov dynamic closure and the correlated structure that arises from value regression for offline RL. While the Besov dynamic closure generalizes the dynamic conditions for offline RL in the prior works, the correlated structure renders the prior works of offline RL with general/neural network function approximation improper or inefficient. To the best of our knowledge, this is the first theoretical characterization of the sample complexity of offline RL with deep neural network function approximation under the general Besov regularity condition that goes beyond the traditional Reproducing Hilbert kernel spaces and Neural Tangent Kernels.
Comments: A short version published in the ICML Workshop on Reinforcement Learning Theory, 2021
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as: arXiv:2103.06671 [stat.ML]
  (or arXiv:2103.06671v5 [stat.ML] for this version)

Submission history

From: Thanh Nguyen-Tang [view email]
[v1] Thu, 11 Mar 2021 14:01:14 GMT (266kb)
[v2] Tue, 22 Jun 2021 03:16:30 GMT (71kb)
[v3] Sun, 11 Jul 2021 16:04:28 GMT (71kb)
[v4] Mon, 15 Aug 2022 18:33:24 GMT (97kb)
[v5] Thu, 18 Aug 2022 00:56:28 GMT (97kb)

Link back to: arXiv, form interface, contact.