We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:


Current browse context:


Change to browse by:

References & Citations

DBLP - CS Bibliography


(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo ScienceWISE logo

Computer Science > Machine Learning

Title: Designing an offline reinforcement learning objective from scratch

Abstract: Offline reinforcement learning has developed rapidly over the recent years, but estimating the actual performance of offline policies still remains a challenge. We propose a scoring metric for offline policies that highly correlates with actual policy performance and can be directly used for offline policy optimization in a supervised manner. To achieve this, we leverage the contrastive learning framework to design a scoring metric that gives high scores to policies that imitate the actions yielding relatively high returns while avoiding those yielding relatively low returns. Our experiments show that 1) our scoring metric is able to more accurately rank offline policies and 2) the policies optimized using our metric show high performance on various offline reinforcement learning benchmarks. Notably, our algorithm has a much lower network capacity requirement for the policy network compared to other supervised learning-based methods and also does not need any additional networks such as a Q-network.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as: arXiv:2301.12842 [cs.LG]
  (or arXiv:2301.12842v1 [cs.LG] for this version)

Submission history

From: Gaon An [view email]
[v1] Mon, 30 Jan 2023 12:51:13 GMT (2318kb,D)

Link back to: arXiv, form interface, contact.