Holistic design for deep learning-based discovery of tabular structures in datasheet images


Kara E., Traquair M., Simsek M., Kantarci B., Khan S.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, cilt.90, 2020 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 90
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1016/j.engappai.2020.103551
  • Dergi Adı: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

Extracting data from tabular structures contained within product datasheets is crucial in many contexts, particularly in the management and optimization of supply chains that serve various industries. In order to minimize human intervention, table detection and table structure detection form the essential functionality. However, a self-contained holistic solution to extract the tables as well as their columns and rows in not readily available. To address this challenge, This study presents a new formal procedure that consists of the following sequence: table detection, structure segmentation and holistic tabular structure detection on documents. The proposed table detection model outperforms the state-of-the-art solutions by achieving a recall value of 1.0 and a precision of more than 0.99 on public competition datasets. Furthermore, this work introduces a judging mechanism and an agreement-based post-processing procedure to incorporate hand-crafted rules into the deep learning models. Though the individual components achieve a new state-of-the-art F1-Score, when integrated the best achieved F-measure for the holistic system is 0.89.