Evaluating the Performance of Reinforcement Learning Algorithms

Jordan, Scott M.; Chandak, Yash; Cohen, Daniel; Zhang, Mengxue; Thomas, Philip S.

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 2006

Computer Science > Machine Learning

Title: Evaluating the Performance of Reinforcement Learning Algorithms

Authors: Scott M. Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip S. Thomas

(Submitted on 30 Jun 2020 (v1), last revised 13 Aug 2020 (this version, v2))

Abstract: Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argue that the inconsistency of performance stems from the use of flawed evaluation metrics. Taking a step towards ensuring that reported results are consistent, we propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments. We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.

Comments:	30 pages, 9 figures, Thirty-seventh International Conference on Machine Learning (ICML 2020)
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.16958 [cs.LG]
	(or arXiv:2006.16958v2 [cs.LG] for this version)

Submission history

From: Scott Jordan [view email]
[v1] Tue, 30 Jun 2020 16:52:23 GMT (3192kb,D)
[v2] Thu, 13 Aug 2020 16:05:28 GMT (3192kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:2006.16958

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Evaluating the Performance of Reinforcement Learning Algorithms

Submission history