Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

Low, Siow Meng; Kumar, Akshat

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2405

Computer Science > Machine Learning

Title: Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

Authors: Siow Meng Low, Akshat Kumar

(Submitted on 5 May 2024)

Abstract: In safe Reinforcement Learning (RL), safety cost is typically defined as a function dependent on the immediate state and actions. In practice, safety constraints can often be non-Markovian due to the insufficient fidelity of state representation, and safety cost may not be known. We therefore address a general setting where safety labels (e.g., safe or unsafe) are associated with state-action trajectories. Our key contributions are: first, we design a safety model that specifically performs credit assignment to assess contributions of partial state-action trajectories on safety. This safety model is trained using a labeled safety dataset. Second, using RL-as-inference strategy we derive an effective algorithm for optimizing a safe policy using the learned safety model. Finally, we devise a method to dynamically adapt the tradeoff coefficient between reward maximization and safety compliance. We rewrite the constrained optimization problem into its dual problem and derive a gradient-based method to dynamically adjust the tradeoff coefficient during training. Our empirical results demonstrate that this approach is highly scalable and able to satisfy sophisticated non-Markovian safety constraints.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2405.03005 [cs.LG]
	(or arXiv:2405.03005v1 [cs.LG] for this version)

Submission history

From: Siow Meng Low [view email]
[v1] Sun, 5 May 2024 17:27:22 GMT (705kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2405.03005

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Safe Reinforcement Learning with Learned Non-Markovian Safety Constraints

Submission history