Energy Conversion and Management, vol.280, 2023 (SCI-Expanded)
© 2023 Elsevier LtdHaving sufficient and qualified datasets is of paramount importance in terms of understanding the internal dynamics of the nature-related phenomenon. Given the necessity to maintain the completeness of the datasets, this study introduced a novel technique containing the implementation of machine learning algorithms and a meta-heuristic optimization algorithm for imputing the gaps encountered in measurements of solar radiation which is one of the crucial meteorological variables in terms of not only climate dynamics but also energy technologies. To accomplish this aim, four different gap sizes, i.e., 5 %, 10 %, 20 %, and 30 %, have synthetically been constituted and the applicability of the extreme gradient boosting (XGBoost) configured by the differential evolution (DE) was examined for each gap size. The corresponding model was benchmarked with conventional interpolation techniques (i.e., linear and spline optimizations) and other widely applied ML algorithms (i.e., random forest and multivariate adaptive regression splines). A multi-perspective input selection strategy was considered to model the missing values based on correlation coefficients under three scenarios encompassing a total of 14 different models. The results revealed that the XGBoost-DE model generated with the solar radiation measurements of neighboring stations was found as the best-performed model in all gap sizes, i.e., 5 % (NSE: 0.950; KGE: 0.967), 10 % (NSE:0.934; KGE: 0.962), and 30 % (NSE: 0.939; KGE: 0.957), but 20 % which the highest accuracy was obtained with the RF (NSE: 0.944; KGE: 0.966). On the other hand, the interpolation techniques had the lowest accuracies among their counterparts in imputation attempts with respect to all gap size alternatives.