Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Khamaru, Koulik; Xia, Eric; Wainwright, Martin J.; Jordan, Michael I.

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2106

Statistics > Machine Learning

Title: Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Authors: Koulik Khamaru, Eric Xia, Martin J. Wainwright, Michael I. Jordan

(Submitted on 28 Jun 2021)

Abstract: Various algorithms in reinforcement learning exhibit dramatic variability in their convergence rates and ultimate accuracy as a function of the problem structure. Such instance-specific behavior is not captured by existing global minimax bounds, which are worst-case in nature. We analyze the problem of estimating optimal $Q$-value functions for a discounted Markov decision process with discrete states and actions and identify an instance-dependent functional that controls the difficulty of estimation in the $\ell_\infty$-norm. Using a local minimax framework, we show that this functional arises in lower bounds on the accuracy on any estimation procedure. In the other direction, we establish the sharpness of our lower bounds, up to factors logarithmic in the state and action spaces, by analyzing a variance-reduced version of $Q$-learning. Our theory provides a precise way of distinguishing "easy" problems from "hard" ones in the context of $Q$-learning, as illustrated by an ensemble with a continuum of difficulty.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2106.14352 [stat.ML]
	(or arXiv:2106.14352v1 [stat.ML] for this version)

Submission history

From: Eric Xia [view email]
[v1] Mon, 28 Jun 2021 00:38:54 GMT (78kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2106.14352

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Instance-optimality in optimal value estimation: Adaptivity via variance-reduced Q-learning

Submission history