Current browse context:
stat.ML
Change to browse by:
References & Citations
Statistics > Machine Learning
Title: POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning
(Submitted on 13 Jan 2020 (v1), last revised 31 Mar 2020 (this version, v2))
Abstract: Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.
Submission history
From: Joseph Futoma [view email][v1] Mon, 13 Jan 2020 01:55:50 GMT (8141kb,D)
[v2] Tue, 31 Mar 2020 15:57:08 GMT (7196kb,D)
Link back to: arXiv, form interface, contact.