Order dispatching for an ultra-fast delivery service via deep reinforcement learning


Kavuk E. M., Tosun Kühn A., Cevik M., Bozanta A., Sonuc S. B., Tutuncu M., ...Daha Fazla

APPLIED INTELLIGENCE, cilt.52, sa.4, ss.4274-4299, 2022 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 52 Sayı: 4
  • Basım Tarihi: 2022
  • Doi Numarası: 10.1007/s10489-021-02610-0
  • Dergi Adı: APPLIED INTELLIGENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, PASCAL, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, Educational research abstracts (ERA), INSPEC, Library, Information Science & Technology Abstracts (LISTA), zbMATH
  • Sayfa Sayıları: ss.4274-4299
  • Anahtar Kelimeler: On-demand delivery, Order dispatching, Reinforcement learning, Deep Q-networks, OPTIMIZATION, SYSTEM
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

This paper proposes a real-life application of deep reinforcement learning to address the order dispatching problem of a Turkish ultra-fast delivery company, Getir. Before applying off-the-shelf reinforcement learning methods, we define the specific problem at Getir and one of the solutions the company has implemented. We discuss the novel aspects of Getir's problem compared to the state-of-the-art order dispatching studies and highlight the limitations of Getir's solution. The overall aim of the company is to deliver to as many customers as possible within 10 minutes. The orders arrive throughout the day, and centralized warehouses in the regions decide whether an incoming order should be served or canceled depending on their couriers' shifts and status. We use Deep Q-networks to learn the actions of warehouses, i.e., accepting or canceling an order, directly from state dimensions using reinforcement learning. We design the networks with two different rewards. We conduct empirical analyses using real-life data provided by Getir to generate training samples and to assess the models' performance during a selected 30-day period with a total of 9880 orders. The results indicate that our proposed models are able to generate policies that outperform the rule-based heuristic employed in practice.