Holistic design for deep learning-based discovery of tabular structures in datasheet images


Kara E., Traquair M., Simsek M., Kantarci B., Khan S.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, vol.90, 2020 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 90
  • Publication Date: 2020
  • Doi Number: 10.1016/j.engappai.2020.103551
  • Journal Name: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
  • Istanbul Technical University Affiliated: Yes

Abstract

Extracting data from tabular structures contained within product datasheets is crucial in many contexts, particularly in the management and optimization of supply chains that serve various industries. In order to minimize human intervention, table detection and table structure detection form the essential functionality. However, a self-contained holistic solution to extract the tables as well as their columns and rows in not readily available. To address this challenge, This study presents a new formal procedure that consists of the following sequence: table detection, structure segmentation and holistic tabular structure detection on documents. The proposed table detection model outperforms the state-of-the-art solutions by achieving a recall value of 1.0 and a precision of more than 0.99 on public competition datasets. Furthermore, this work introduces a judging mechanism and an agreement-based post-processing procedure to incorporate hand-crafted rules into the deep learning models. Though the individual components achieve a new state-of-the-art F1-Score, when integrated the best achieved F-measure for the holistic system is 0.89.