Multimodal soft biometrics: combining ear and face biometrics for age and gender classification

Yaman D., Eyiokur F. I., Ekenel H. K.

MULTIMEDIA TOOLS AND APPLICATIONS, vol.81, no.16, pp.22695-22713, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 81 Issue: 16
  • Publication Date: 2022
  • Doi Number: 10.1007/s11042-021-10630-8
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, FRANCIS, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Page Numbers: pp.22695-22713
  • Keywords: Multimodal learning, Multitask learning, Soft biometrics, Age estimation, Gender classification, Convolutional neural networks
  • Istanbul Technical University Affiliated: Yes


In this paper, we present a multimodal, multitask deep convolutional neural network framework for age and gender classification. In the developed framework, we have employed two different biometric modalities: ear and profile face. We have explored three different fusion methods, namely, data, feature, and score fusion, to combine the information extracted from ear and profile face images. In the framework, we have utilized VGG-16 and ResNet-50 models with center loss to obtain more discriminative features. Moreover, we have performed two-stage fine-tuning to increase the representation capacity of the models. To assess the performance of the proposed approach, we have conducted extensive experiments on the FERET, UND-F, and UND-J2 datasets. Experimental results indicate that ear and profile face images contain useful features to extract soft biometric traits. We have shown that when frontal face view of the subject is not available, use of ear and profile face images can be a good alternative for the soft biometric recognition systems. The presented multimodal system achieves very high age and gender classification accuracies, matching the ones obtained by using frontal face images. The multimodal approach has outperformed both the unimodal approaches and the previous state-of-the-art profile face image or ear image-based age and gender classification methods, significantly in both tasks.