Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Dann, Christoph; Lattimore, Tor; Brunskill, Emma

Full-text links:

Download:

Current browse context:

cs.LG

< prev | next >

new | recent | 1703

Computer Science > Machine Learning

Title: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Authors: Christoph Dann, Tor Lattimore, Emma Brunskill

(Submitted on 22 Mar 2017 (v1), last revised 2 Jan 2018 (this version, v3))

Abstract: Statistical performance bounds for reinforcement learning (RL) algorithms can be critical for high-stakes applications like healthcare. This paper introduces a new framework for theoretically measuring the performance of such algorithms called Uniform-PAC, which is a strengthening of the classical Probably Approximately Correct (PAC) framework. In contrast to the PAC framework, the uniform version may be used to derive high probability regret guarantees and so forms a bridge between the two setups that has been missing in the literature. We demonstrate the benefits of the new framework for finite-state episodic MDPs with a new algorithm that is Uniform-PAC and simultaneously achieves optimal regret and PAC guarantees except for a factor of the horizon.

Comments:	appears in Neural Information Processing Systems 2017
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1703.07710 [cs.LG]
	(or arXiv:1703.07710v3 [cs.LG] for this version)

Submission history

From: Christoph Dann [view email]
[v1] Wed, 22 Mar 2017 15:34:23 GMT (928kb,D)
[v2] Tue, 26 Sep 2017 21:04:38 GMT (935kb,D)
[v3] Tue, 2 Jan 2018 13:25:46 GMT (1300kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> cs > arXiv:1703.07710

Download:

Current browse context:

Change to browse by:

References & Citations

DBLP - CS Bibliography

Bookmark

Computer Science > Machine Learning

Title: Unifying PAC and Regret: Uniform PAC Bounds for Episodic Reinforcement Learning

Submission history