Reinforcement Learning with a Terminator

Tennenholtz, Guy; Merlis, Nadav; Shani, Lior; Mannor, Shie; Shalit, Uri; Chechik, Gal; Hallak, Assaf; Dalal, Gal

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2205

Computer Science > Machine Learning

Title: Reinforcement Learning with a Terminator

Authors: Guy Tennenholtz, Nadav Merlis, Lior Shani, Shie Mannor, Uri Shalit, Gal Chechik, Assaf Hallak, Gal Dalal

(Submitted on 30 May 2022 (v1), last revised 5 Oct 2023 (this version, v2))

Abstract: We present the problem of reinforcement learning with exogenous termination. We define the Termination Markov Decision Process (TerMDP), an extension of the MDP framework, in which episodes may be interrupted by an external non-Markovian observer. This formulation accounts for numerous real-world situations, such as a human interrupting an autonomous driving agent for reasons of discomfort. We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret. Motivated by our theoretical analysis, we design and implement a scalable approach, which combines optimism (w.r.t. termination) and a dynamic discount factor, incorporating the termination probability. We deploy our method on high-dimensional driving and MinAtar benchmarks. Additionally, we test our approach on human data in a driving setting. Our results demonstrate fast convergence and significant improvement over various baseline approaches.

Comments:	NeurIPS 2022
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2205.15376 [cs.LG]
	(or arXiv:2205.15376v2 [cs.LG] for this version)

Submission history

From: Guy Tennenholtz [view email]
[v1] Mon, 30 May 2022 18:40:28 GMT (2535kb,D)
[v2] Thu, 5 Oct 2023 19:02:39 GMT (1273kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2205.15376

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Reinforcement Learning with a Terminator

Submission history