Developing a novel approach for missing data imputation of solar radiation: A hybrid differential evolution algorithm based eXtreme gradient boosting model

Başakın E. E., Ekmekcioğlu Ö., Özger M.

Energy Conversion and Management, vol.280, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 280
  • Publication Date: 2023
  • Doi Number: 10.1016/j.enconman.2023.116780
  • Journal Name: Energy Conversion and Management
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, CAB Abstracts, Communication Abstracts, Computer & Applied Sciences, Environment Index, INSPEC, Pollution Abstracts, Veterinary Science Database, Civil Engineering Abstracts
  • Keywords: Differential evolution, Extreme gradient boosting, Meteorological measurements, Missing data imputation, Solar radiation
  • Istanbul Technical University Affiliated: Yes


© 2023 Elsevier LtdHaving sufficient and qualified datasets is of paramount importance in terms of understanding the internal dynamics of the nature-related phenomenon. Given the necessity to maintain the completeness of the datasets, this study introduced a novel technique containing the implementation of machine learning algorithms and a meta-heuristic optimization algorithm for imputing the gaps encountered in measurements of solar radiation which is one of the crucial meteorological variables in terms of not only climate dynamics but also energy technologies. To accomplish this aim, four different gap sizes, i.e., 5 %, 10 %, 20 %, and 30 %, have synthetically been constituted and the applicability of the extreme gradient boosting (XGBoost) configured by the differential evolution (DE) was examined for each gap size. The corresponding model was benchmarked with conventional interpolation techniques (i.e., linear and spline optimizations) and other widely applied ML algorithms (i.e., random forest and multivariate adaptive regression splines). A multi-perspective input selection strategy was considered to model the missing values based on correlation coefficients under three scenarios encompassing a total of 14 different models. The results revealed that the XGBoost-DE model generated with the solar radiation measurements of neighboring stations was found as the best-performed model in all gap sizes, i.e., 5 % (NSE: 0.950; KGE: 0.967), 10 % (NSE:0.934; KGE: 0.962), and 30 % (NSE: 0.939; KGE: 0.957), but 20 % which the highest accuracy was obtained with the RF (NSE: 0.944; KGE: 0.966). On the other hand, the interpolation techniques had the lowest accuracies among their counterparts in imputation attempts with respect to all gap size alternatives.