Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Gehring, Clement; Asai, Masataro; Chitnis, Rohan; Silver, Tom; Kaelbling, Leslie Pack; Sohrabi, Shirin; Katz, Michael

Full-text links:

Download:

Current browse context:

cs.AI

< prev | next >

new | recent | 2109

Computer Science > Artificial Intelligence

Title: Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Authors: Clement Gehring, Masataro Asai, Rohan Chitnis, Tom Silver, Leslie Pack Kaelbling, Shirin Sohrabi, Michael Katz

(Submitted on 30 Sep 2021 (v1), last revised 7 Mar 2022 (this version, v2))

Abstract: Recent advances in reinforcement learning (RL) have led to a growing interest in applying RL to classical planning domains or applying classical planning methods to some complex RL domains. However, the long-horizon goal-based problems found in classical planning lead to sparse rewards for RL, making direct application inefficient. In this paper, we propose to leverage domain-independent heuristic functions commonly used in the classical planning literature to improve the sample efficiency of RL. These classical heuristics act as dense reward generators to alleviate the sparse-rewards issue and enable our RL agent to learn domain-specific value functions as residuals on these heuristics, making learning easier. Correct application of this technique requires consolidating the discounted metric used in RL and the non-discounted metric used in heuristics. We implement the value functions using Neural Logic Machines, a neural network architecture designed for grounded first-order logic inputs. We demonstrate on several classical planning domains that using classical heuristics for RL allows for good sample efficiency compared to sparse-reward RL. We further show that our learned value functions generalize to novel problem instances in the same domain.

Comments:	Equal contributions by the first two authors. This manuscript is a camera-ready version accepted in ICAPS-2022. It is significantly updated from past versions (e.g., in the ICAPS PRL (Planning and RL) workshop) with additional experiments comparing existing work (STRIPS-HGN (Shen, Trevizan, and Thiebaux 2020) and GBFS-GNN (Rivlin, Hazan, and Karpas 2019))
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2109.14830 [cs.AI]
	(or arXiv:2109.14830v2 [cs.AI] for this version)

Submission history

From: Masataro Asai [view email]
[v1] Thu, 30 Sep 2021 03:36:01 GMT (9246kb,D)
[v2] Mon, 7 Mar 2022 18:51:01 GMT (5184kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2109.14830

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Artificial Intelligence

Title: Reinforcement Learning for Classical Planning: Viewing Heuristics as Dense Reward Generators

Submission history