A Multi-channel Deep Learning Architecture for Understanding the Urban Scene Semantics

Demirtas T., PARLAK İ. B.

4th International Conference on Intelligent and Fuzzy Systems (INFUS), Bornova, Turkey, 19 - 21 July 2022, vol.505, pp.101-108 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Volume: 505
  • Doi Number: 10.1007/978-3-031-09176-6_12
  • City: Bornova
  • Country: Turkey
  • Page Numbers: pp.101-108
  • Keywords: Smart city, Urban scene, Image semantics, Machine learning, Artificial neural networks
  • Istanbul Technical University Affiliated: No


Smart city analysis becomes an emerging field in autonomous urban problems. Image semantics is a complex problem where the image classification, the semantic segmentation and the object detection subroutines are staged in a cascade framework through spatio-temporal datasets. Urban scene analysis has been coupled in several applications such as security, autonomous vehicles, and mass transport. The initial problem of an urban scene is characterized as the pursuit of discrete 2D-3D movements on the street. Therefore, an accurate segmentation of the scene is required to minimize the spatial gap in the scene. The complete framework provides critical decision-making tools to protect the human and the moving objects. New generation autonomous systems turn on their real time sensors to monitor all possible movements in a street. However, the sensing stage must be interpreted within semantics to generate the city insights. The task of semantic segmentation is to label every pixel including the background into a semantic class. The object detection locates the presence of objects and types or classes of the located objects in an image. The instance segmentation contains these two tasks. This paper is composed of the instance segmentation of base objects through Cityscapes dataset. YOLACT deep learning architecture has been applied on high resolution images. The method has been found fast as it requires one stage segmentation. We conclude that YOLACT architecture generates feasible labels in an accurate dataset where spatial gaps are lower. The smart city analysis would be processed better with new hierarchical labels.