Invariant Policy Learning: A Causal Perspective

Saengkyongam, Sorawit; Thams, Nikolaj; Peters, Jonas; Pfister, Niklas

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2106

Computer Science > Machine Learning

Title: Invariant Policy Learning: A Causal Perspective

Authors: Sorawit Saengkyongam, Nikolaj Thams, Jonas Peters, Niklas Pfister

(Submitted on 1 Jun 2021 (v1), last revised 22 Sep 2022 (this version, v4))

Abstract: Contextual bandit and reinforcement learning algorithms have been successfully used in various interactive learning systems such as online advertising, recommender systems, and dynamic pricing. However, they have yet to be widely adopted in high-stakes application domains, such as healthcare. One reason may be that existing approaches assume that the underlying mechanisms are static in the sense that they do not change over different environments. In many real-world systems, however, the mechanisms are subject to shifts across environments which may invalidate the static environment assumption. In this paper, we take a step toward tackling the problem of environmental shifts considering the framework of offline contextual bandits. We view the environmental shift problem through the lens of causality and propose multi-environment contextual bandits that allow for changes in the underlying mechanisms. We adopt the concept of invariance from the causality literature and introduce the notion of policy invariance. We argue that policy invariance is only relevant if unobserved variables are present and show that, in that case, an optimal invariant policy is guaranteed to generalize across environments under suitable assumptions. Our results establish concrete connections among causality, invariance, and contextual bandits.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:2106.00808 [cs.LG]
	(or arXiv:2106.00808v4 [cs.LG] for this version)

Submission history

From: Sorawit Saengkyongam [view email]
[v1] Tue, 1 Jun 2021 21:20:48 GMT (109kb,D)
[v2] Mon, 7 Jun 2021 09:46:47 GMT (109kb,D)
[v3] Wed, 11 Aug 2021 14:46:27 GMT (136kb,D)
[v4] Thu, 22 Sep 2022 08:45:10 GMT (188kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2106.00808

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Invariant Policy Learning: A Causal Perspective

Submission history