We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

stat.ML

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Statistics > Machine Learning

Title: An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Abstract: Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.
Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Methodology (stat.ME)
Cite as: arXiv:2212.14468 [stat.ML]
  (or arXiv:2212.14468v2 [stat.ML] for this version)

Submission history

From: Yang Xu [view email]
[v1] Thu, 29 Dec 2022 22:06:51 GMT (538kb,D)
[v2] Thu, 2 Feb 2023 18:08:26 GMT (440kb,D)

Link back to: arXiv, form interface, contact.