Importance Sampling based Exploration in Q Learning

Kumar, Vijay; Webster, Mort

Full-text links:

Download:

Current browse context:

math.OC

< prev | next >

new | recent | 2107

Mathematics > Optimization and Control

Title: Importance Sampling based Exploration in Q Learning

Authors: Vijay Kumar, Mort Webster

(Submitted on 1 Jul 2021)

Abstract: Approximate Dynamic Programming (ADP) is a methodology to solve multi-stage stochastic optimization problems in multi-dimensional discrete or continuous spaces. ADP approximates the optimal value function by adaptively sampling both action and state space. It provides a tractable approach to very large problems, but can suffer from the exploration-exploitation dilemma. We propose a novel approach for selecting actions using importance sampling weighted by the value function approximation in continuous decision spaces to address this dilemma. An advantage of this approach is it balances exploration and exploitation without any tuning parameters when sampling actions compared to other exploration approaches such as Epsilon Greedy, instead relying only on the approximate value function. We compare the proposed algorithm with other exploration strategies in continuous action space in the context of a multi-stage generation expansion planning problem under uncertainty.

Subjects:	Optimization and Control (math.OC)
Cite as:	arXiv:2107.00602 [math.OC]
	(or arXiv:2107.00602v1 [math.OC] for this version)

Submission history

From: Vijay Kumar [view email]
[v1] Thu, 1 Jul 2021 16:48:35 GMT (495kb,D)

Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

Link back to: arXiv, form interface, contact.

> math > arXiv:2107.00602

Download:

Current browse context:

Change to browse by:

References & Citations

Bookmark

Mathematics > Optimization and Control

Title: Importance Sampling based Exploration in Q Learning

Submission history