Current browse context:
cs.LG
Change to browse by:
References & Citations
Computer Science > Machine Learning
Title: Variance-Aware Off-Policy Evaluation with Linear Function Approximation
(Submitted on 22 Jun 2021 (v1), last revised 4 Jan 2022 (this version, v2))
Abstract: We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose to incorporate the variance information of the value function to improve the sample efficiency of OPE. More specifically, for time-inhomogeneous episodic linear Markov decision processes (MDPs), we propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration. We show that our algorithm achieves a tighter error bound than the best-known result. We also provide a fine-grained characterization of the distribution shift between the behavior policy and the target policy. Extensive numerical experiments corroborate our theory.
Submission history
From: Quanquan Gu [view email][v1] Tue, 22 Jun 2021 17:58:46 GMT (198kb,D)
[v2] Tue, 4 Jan 2022 01:39:34 GMT (205kb,D)
Link back to: arXiv, form interface, contact.