An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Xu, Yang; Zhu, Jin; Shi, Chengchun; Luo, Shikai; Song, Rui

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2212

Statistics > Machine Learning

Title: An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Authors: Yang Xu, Jin Zhu, Chengchun Shi, Shikai Luo, Rui Song

(Submitted on 29 Dec 2022 (v1), last revised 2 Feb 2023 (this version, v2))

Abstract: Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Cite as:	arXiv:2212.14468 [stat.ML]
	(or arXiv:2212.14468v2 [stat.ML] for this version)

Submission history

From: Yang Xu [view email]
[v1] Thu, 29 Dec 2022 22:06:51 GMT (538kb,D)
[v2] Thu, 2 Feb 2023 18:08:26 GMT (440kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2212.14468

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Submission history