Current browse context:
cs.AI
Change to browse by:
References & Citations
Computer Science > Artificial Intelligence
Title: Model-Based Offline Planning with Trajectory Pruning
(Submitted on 16 May 2021 (v1), last revised 21 Apr 2022 (this version, v3))
Abstract: The recent offline reinforcement learning (RL) studies have achieved much progress to make RL usable in real-world systems by learning policies from pre-collected datasets without environment interaction. Unfortunately, existing offline RL methods still face many practical challenges in real-world system control tasks, such as computational restriction during agent training and the requirement of extra control flexibility. The model-based planning framework provides an attractive alternative. However, most model-based planning algorithms are not designed for offline settings. Simply combining the ingredients of offline RL with existing methods either provides over-restrictive planning or leads to inferior performance. We propose a new light-weighted model-based offline planning framework, namely MOPP, which tackles the dilemma between the restrictions of offline learning and high-performance planning. MOPP encourages more aggressive trajectory rollout guided by the behavior policy learned from data, and prunes out problematic trajectories to avoid potential out-of-distribution samples. Experimental results show that MOPP provides competitive performance compared with existing model-based offline planning and RL approaches.
Submission history
From: Xianyuan Zhan [view email][v1] Sun, 16 May 2021 05:00:54 GMT (2280kb,D)
[v2] Mon, 27 Sep 2021 01:58:59 GMT (4399kb,D)
[v3] Thu, 21 Apr 2022 08:43:40 GMT (5979kb,D)
Link back to: arXiv, form interface, contact.