Developing a novel approach for missing data imputation of solar radiation: A hybrid differential evolution algorithm based eXtreme gradient boosting model


Başakın E. E., Ekmekcioğlu Ö., Özger M.

Energy Conversion and Management, cilt.280, 2023 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 280
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1016/j.enconman.2023.116780
  • Dergi Adı: Energy Conversion and Management
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, CAB Abstracts, Communication Abstracts, Computer & Applied Sciences, Environment Index, INSPEC, Pollution Abstracts, Veterinary Science Database, Civil Engineering Abstracts
  • Anahtar Kelimeler: Differential evolution, Extreme gradient boosting, Meteorological measurements, Missing data imputation, Solar radiation
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

© 2023 Elsevier LtdHaving sufficient and qualified datasets is of paramount importance in terms of understanding the internal dynamics of the nature-related phenomenon. Given the necessity to maintain the completeness of the datasets, this study introduced a novel technique containing the implementation of machine learning algorithms and a meta-heuristic optimization algorithm for imputing the gaps encountered in measurements of solar radiation which is one of the crucial meteorological variables in terms of not only climate dynamics but also energy technologies. To accomplish this aim, four different gap sizes, i.e., 5 %, 10 %, 20 %, and 30 %, have synthetically been constituted and the applicability of the extreme gradient boosting (XGBoost) configured by the differential evolution (DE) was examined for each gap size. The corresponding model was benchmarked with conventional interpolation techniques (i.e., linear and spline optimizations) and other widely applied ML algorithms (i.e., random forest and multivariate adaptive regression splines). A multi-perspective input selection strategy was considered to model the missing values based on correlation coefficients under three scenarios encompassing a total of 14 different models. The results revealed that the XGBoost-DE model generated with the solar radiation measurements of neighboring stations was found as the best-performed model in all gap sizes, i.e., 5 % (NSE: 0.950; KGE: 0.967), 10 % (NSE:0.934; KGE: 0.962), and 30 % (NSE: 0.939; KGE: 0.957), but 20 % which the highest accuracy was obtained with the RF (NSE: 0.944; KGE: 0.966). On the other hand, the interpolation techniques had the lowest accuracies among their counterparts in imputation attempts with respect to all gap size alternatives.