Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

Štrupl, Miroslav; Faccio, Francesco; Ashley, Dylan R.; Schmidhuber, Jürgen; Srivastava, Rupesh Kumar

Full-text links:

Download:

Current browse context:

stat.ML

< prev | next >

new | recent | 2205

Statistics > Machine Learning

Title: Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

Authors: Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh Kumar Srivastava

(Submitted on 13 May 2022)

Abstract: Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) -- which can be viewed as a simplified version of UDRL -- optimizes a lower bound on goal-reaching performance. This raises expectations that such algorithms may enjoy guaranteed convergence to the optimal policy in arbitrary environments, similar to certain well-known traditional RL algorithms. Here we show that for a specific episodic UDRL algorithm (eUDRL, including GCSL), this is not the case, and give the causes of this limitation. To do so, we first introduce a helpful rewrite of eUDRL as a recursive policy update. This formulation helps to disprove its convergence to the optimal policy for a wide class of stochastic environments. Finally, we provide a concrete example of a very simple environment where eUDRL diverges. Since the primary aim of this paper is to present a negative result, and the best counterexamples are the simplest ones, we restrict all discussions to finite (discrete) environments, ignoring issues of function approximation and limited sample size.

Comments:	presented at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making; 5 pages in main text + 1 page of references + 3 pages of appendices, 1 figure in main text; source code available at this https URL
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
MSC classes:	68T05
ACM classes:	I.2.6
Cite as:	arXiv:2205.06595 [stat.ML]
	(or arXiv:2205.06595v1 [stat.ML] for this version)

Submission history

From: Dylan Ashley [view email]
[v1] Fri, 13 May 2022 12:43:25 GMT (155kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> stat > arXiv:2205.06595

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Statistics > Machine Learning

Title: Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

Submission history