References & Citations
Computer Science > Machine Learning
Title: On a convergent off -policy temporal difference learning algorithm in on-line learning environment
(Submitted on 19 May 2016)
Abstract: In this paper we provide a rigorous convergence analysis of a "off"-policy temporal difference learning algorithm with linear function approximation and per time-step linear computational complexity in "online" learning environment. The algorithm considered here is TDC with importance weighting introduced by Maei et al. We support our theoretical results by providing suitable empirical results for standard off-policy counterexamples.
Submission history
From: Prasenjit Karmakar [view email][v1] Thu, 19 May 2016 18:32:50 GMT (242kb,D)
Link back to: arXiv, form interface, contact.