MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Kargar, Eshagh; Kyrki, Ville

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2109

Computer Science > Machine Learning

Title: MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Authors: Eshagh Kargar, Ville Kyrki

(Submitted on 2 Sep 2021)

Abstract: This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called \textit{Multi-Agent Cooperative Recurrent Proximal Policy Optimization} (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as QMIX and MADDPG and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at this https URL

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA); Robotics (cs.RO)
Cite as:	arXiv:2109.00882 [cs.LG]
	(or arXiv:2109.00882v1 [cs.LG] for this version)

Submission history

From: Eshagh Kargar [view email]
[v1] Thu, 2 Sep 2021 12:43:35 GMT (2744kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2109.00882

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

Submission history