Generating Automatic Curricula via Self-Supervised Active Domain Randomization

Raparthy, Sharath Chandra; Mehta, Bhairav; Golemo, Florian; Paull, Liam

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2002

Computer Science > Machine Learning

Title: Generating Automatic Curricula via Self-Supervised Active Domain Randomization

Authors: Sharath Chandra Raparthy, Bhairav Mehta, Florian Golemo, Liam Paull

(Submitted on 18 Feb 2020 (v1), last revised 26 Oct 2020 (this version, v2))

Abstract: Goal-directed Reinforcement Learning (RL) traditionally considers an agent interacting with an environment, prescribing a real-valued reward to an agent proportional to the completion of some goal. Goal-directed RL has seen large gains in sample efficiency, due to the ease of reusing or generating new experience by proposing goals. One approach,self-play, allows an agent to "play" against itself by alternatively setting and accomplishing goals, creating a learned curriculum through which an agent can learn to accomplish progressively more difficult goals. However, self-play has been limited to goal curriculum learning or learning progressively harder goals within a single environment. Recent work on robotic agents has shown that varying the environment during training, for example with domain randomization, leads to more robust transfer. As a result, we extend the self-play framework to jointly learn a goal and environment curriculum, leading to an approach that learns the most fruitful domain randomization strategy with self-play. Our method, Self-Supervised Active Domain Randomization(SS-ADR), generates a coupled goal-task curriculum, where agents learn through progressively more difficult tasks and environment variations. By encouraging the agent to try tasks that are just outside of its current capabilities, SS-ADR builds a domain randomization curriculum that enables state-of-the-art results on varioussim2real transfer tasks. Our results show that a curriculum of co-evolving the environment difficulty together with the difficulty of goals set in each environment provides practical benefits in the goal-directed tasks tested.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
Cite as:	arXiv:2002.07911 [cs.LG]
	(or arXiv:2002.07911v2 [cs.LG] for this version)

Submission history

From: Sharath Chandra Raparthy [view email]
[v1] Tue, 18 Feb 2020 22:45:29 GMT (1309kb,D)
[v2] Mon, 26 Oct 2020 18:24:29 GMT (1514kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2002.07911

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Generating Automatic Curricula via Self-Supervised Active Domain Randomization

Submission history