We gratefully acknowledge support from
the Simons Foundation and member institutions.
Full-text links:

Download:

Current browse context:

math.OC

Change to browse by:

References & Citations

Bookmark

(what is this?)
CiteULike logo BibSonomy logo Mendeley logo del.icio.us logo Digg logo Reddit logo

Mathematics > Optimization and Control

Title: Distributed Adaptive Reinforcement Learning: A Method for Optimal Routing

Abstract: In this paper, a learning-based optimal transportation algorithm for autonomous taxis and ridesharing vehicles is presented. The goal is to design a mechanism to solve the routing problem for multiple autonomous vehicles and multiple customers in order to maximize the transportation company's profit. As a result, each vehicle selects the customer whose request maximizes the company's profit in the long run. To solve this problem, the system is modeled as a Markov Decision Process (MDP) using past customers data. By solving the defined MDP, a centralized high-level planning recommendation is obtained, where this offline solution is used as an initial value for the real-time learning. Then, a distributed SARSA reinforcement learning algorithm is proposed to capture the model errors and the environment changes, such as variations in customer distributions in each area, traffic, and fares, thereby providing optimal routing policies in real-time. Vehicles, or agents, use only their local information and interaction, such as current passenger requests and estimates of neighbors' tasks and their optimal actions, to obtain the optimal policies in a distributed fashion. An optimal adaptive rate is introduced to make the distributed SARSA algorithm capable of adapting to changes in the environment and tracking the time-varying optimal policies. Furthermore, a game-theory-based task assignment algorithm is proposed, where each agent uses the optimal policies and their values from distributed SARSA to select its customer from the set of local available requests in a distributed manner. Finally, the customers data provided by the city of Chicago is used to validate the proposed algorithms.
Subjects: Optimization and Control (math.OC)
Cite as: arXiv:2005.01976 [math.OC]
  (or arXiv:2005.01976v1 [math.OC] for this version)

Submission history

From: Salar Rahili [view email]
[v1] Tue, 5 May 2020 07:28:46 GMT (6740kb,D)

Link back to: arXiv, form interface, contact.