Order dispatching for an ultra-fast delivery service via deep reinforcement learning


Kavuk E. M. , Tosun Kühn A., Cevik M., Bozanta A., Sonuc S. B. , Tutuncu M., ...More

APPLIED INTELLIGENCE, vol.52, no.4, pp.4274-4299, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 52 Issue: 4
  • Publication Date: 2022
  • Doi Number: 10.1007/s10489-021-02610-0
  • Journal Name: APPLIED INTELLIGENCE
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, PASCAL, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, Educational research abstracts (ERA), INSPEC, Library, Information Science & Technology Abstracts (LISTA), zbMATH
  • Page Numbers: pp.4274-4299
  • Keywords: On-demand delivery, Order dispatching, Reinforcement learning, Deep Q-networks, OPTIMIZATION, SYSTEM
  • Istanbul Technical University Affiliated: Yes

Abstract

This paper proposes a real-life application of deep reinforcement learning to address the order dispatching problem of a Turkish ultra-fast delivery company, Getir. Before applying off-the-shelf reinforcement learning methods, we define the specific problem at Getir and one of the solutions the company has implemented. We discuss the novel aspects of Getir's problem compared to the state-of-the-art order dispatching studies and highlight the limitations of Getir's solution. The overall aim of the company is to deliver to as many customers as possible within 10 minutes. The orders arrive throughout the day, and centralized warehouses in the regions decide whether an incoming order should be served or canceled depending on their couriers' shifts and status. We use Deep Q-networks to learn the actions of warehouses, i.e., accepting or canceling an order, directly from state dimensions using reinforcement learning. We design the networks with two different rewards. We conduct empirical analyses using real-life data provided by Getir to generate training samples and to assess the models' performance during a selected 30-day period with a total of 9880 orders. The results indicate that our proposed models are able to generate policies that outperform the rule-based heuristic employed in practice.