Scaling Learning based Policy Optimization for Temporal Tasks via Dropout

Hashemi, Navid; Hoxha, Bardh; Prokhorov, Danil; Fainekos, Georgios; Deshmukh, Jyotirmoy

Full-text links:

Download:

Current browse context:

eess.SY

< prev | next >

new | recent | 2403

Electrical Engineering and Systems Science > Systems and Control

Title: Scaling Learning based Policy Optimization for Temporal Tasks via Dropout

Authors: Navid Hashemi, Bardh Hoxha, Danil Prokhorov, Georgios Fainekos, Jyotirmoy Deshmukh

(Submitted on 23 Mar 2024)

Abstract: This paper introduces a model-based approach for training feedback controllers for an autonomous agent operating in a highly nonlinear environment. We desire the trained policy to ensure that the agent satisfies specific task objectives, expressed in discrete-time Signal Temporal Logic (DT-STL). One advantage for reformulation of a task via formal frameworks, like DT-STL, is that it permits quantitative satisfaction semantics. In other words, given a trajectory and a DT-STL formula, we can compute the robustness, which can be interpreted as an approximate signed distance between the trajectory and the set of trajectories satisfying the formula. We utilize feedback controllers, and we assume a feed forward neural network for learning these feedback controllers. We show how this learning problem is similar to training recurrent neural networks (RNNs), where the number of recurrent units is proportional to the temporal horizon of the agent's task objectives. This poses a challenge: RNNs are susceptible to vanishing and exploding gradients, and na\"{i}ve gradient descent-based strategies to solve long-horizon task objectives thus suffer from the same problems. To tackle this challenge, we introduce a novel gradient approximation algorithm based on the idea of dropout or gradient sampling. We show that, the existing smooth semantics for robustness are inefficient regarding gradient computation when the specification becomes complex. To address this challenge, we propose a new smooth semantics for DT-STL that under-approximates the robustness value and scales well for backpropagation over a complex specification. We show that our control synthesis methodology, can be quite helpful for stochastic gradient descent to converge with less numerical issues, enabling scalable backpropagation over long time horizons and trajectories over high dimensional state spaces.

Subjects:	Systems and Control (eess.SY); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2403.15826 [eess.SY]
	(or arXiv:2403.15826v1 [eess.SY] for this version)

Submission history

From: Navid Hashemi [view email]
[v1] Sat, 23 Mar 2024 12:53:51 GMT (2042kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> eess > arXiv:2403.15826

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Electrical Engineering and Systems Science > Systems and Control

Title: Scaling Learning based Policy Optimization for Temporal Tasks via Dropout

Submission history