Learning with Type-2 Fuzzy activation functions to improve the performance of Deep Neural Networks


Beke A., Kumbasar T.

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, vol.85, pp.372-384, 2019 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 85
  • Publication Date: 2019
  • Doi Number: 10.1016/j.engappai.2019.06.016
  • Journal Name: ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.372-384
  • Keywords: Interval Type-2 Fuzzy systems, Footprint of uncertainty, Deep Neural Networks, Deep learning, Activation units, INTERVAL TYPE-2, MODEL
  • Istanbul Technical University Affiliated: Yes

Abstract

In this study, we propose a novel Interval Type-2 (IT2) Fuzzy activation layer that is composed of Single input IT2 (SIT2) Fuzzy Rectifying Units (FRUs) to improve the learning performances of Deep Neural Networks (DNNs). The novel SIT2-FRU has tunable parameters that not only define the slopes of the positive and negative quadrants but also the characteristic of the input-output mapping of the activation function. The novel SIT2-FRU also alleviates vanishing gradient problem and has a fast convergence rate since it can push the mean activation to around zero by processing the inputs defined in the negative quadrant. Thus, SIT2-FRU gives the opportunity to the DNN to have a better learning behavior as it is capable to express linear or sophisticated input-output mapping by simply tuning the footprint of uncertainty of its IT2 fuzzy sets. In order to examine the performance of the SIT2-FRU, comparative experimental studies are performed on the MNIST, Quickdraw Pictionary and CIFAR-10 benchmark datasets. The proposed SIT2-FRU is compared with the state of the art activation functions which are the Rectified Linear Unit (ReLU), Parametric ReLU (PReLU) and Exponential Linear Unit (ELU). Comparative experimental results and analyses clearly show the enhancement in the learning performance of DNNs that include activation layer(s) composed of SIT2-FRUs. It is shown that the learning performance of the SIT2-FRU is robust to different parameter settings of the learning rates and mini batch sizes. Furthermore, the experimental results show that SIT2-FRU can result with a high performance with or without batch normalization layers unlike the other employed activation units. It is concluded that DNNs with SIT2-FRUs have a satisfactory generalization capability, a robust and high learning performance when compared to the ReLU, PReLU and ELU activation functions.