Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Gangwani, Tanmay; Peng, Jian; Zhou, Yuan

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2011

Computer Science > Machine Learning

Title: Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Authors: Tanmay Gangwani, Jian Peng, Yuan Zhou

(Submitted on 5 Nov 2020)

Abstract: Quality-Diversity (QD) is a concept from Neuroevolution with some intriguing applications to Reinforcement Learning. It facilitates learning a population of agents where each member is optimized to simultaneously accumulate high task-returns and exhibit behavioral diversity compared to other members. In this paper, we build on a recent kernel-based method for training a QD policy ensemble with Stein variational gradient descent. With kernels based on $f$-divergence between the stationary distributions of policies, we convert the problem to that of efficient estimation of the ratio of these stationary distributions. We then study various distribution ratio estimators used previously for off-policy evaluation and imitation and re-purpose them to compute the gradients for policies in an ensemble such that the resultant population is diverse and of high-quality.

Comments:	CoRL 2020 camera-ready
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2011.02614 [cs.LG]
	(or arXiv:2011.02614v1 [cs.LG] for this version)

Submission history

From: Tanmay Gangwani [view email]
[v1] Thu, 5 Nov 2020 02:02:44 GMT (1444kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2011.02614

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Submission history