We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

cs.LG

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Computer Science > Machine Learning

Title: Generating Automatic Curricula via Self-Supervised Active Domain Randomization

Abstract: Goal-directed Reinforcement Learning (RL) traditionally considers an agent interacting with an environment, prescribing a real-valued reward to an agent proportional to the completion of some goal. Goal-directed RL has seen large gains in sample efficiency, due to the ease of reusing or generating new experience by proposing goals. One approach,self-play, allows an agent to "play" against itself by alternatively setting and accomplishing goals, creating a learned curriculum through which an agent can learn to accomplish progressively more difficult goals. However, self-play has been limited to goal curriculum learning or learning progressively harder goals within a single environment. Recent work on robotic agents has shown that varying the environment during training, for example with domain randomization, leads to more robust transfer. As a result, we extend the self-play framework to jointly learn a goal and environment curriculum, leading to an approach that learns the most fruitful domain randomization strategy with self-play. Our method, Self-Supervised Active Domain Randomization(SS-ADR), generates a coupled goal-task curriculum, where agents learn through progressively more difficult tasks and environment variations. By encouraging the agent to try tasks that are just outside of its current capabilities, SS-ADR builds a domain randomization curriculum that enables state-of-the-art results on varioussim2real transfer tasks. Our results show that a curriculum of co-evolving the environment difficulty together with the difficulty of goals set in each environment provides practical benefits in the goal-directed tasks tested.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Machine Learning (stat.ML)
Cite as: arXiv:2002.07911 [cs.LG]
  (or arXiv:2002.07911v2 [cs.LG] for this version)

Submission history

From: Sharath Chandra Raparthy [view email]
[v1] Tue, 18 Feb 2020 22:45:29 GMT (1309kb,D)
[v2] Mon, 26 Oct 2020 18:24:29 GMT (1514kb,D)

Link back to: arXiv, form interface, contact.