Explainable Artificial Intelligence for Cotton Yield Prediction With Multisource Data


Çelik M. F., Işık M. S., Taşkın G., Erten E., Camps-Valls G.

IEEE Geoscience and Remote Sensing Letters, cilt.20, 2023 (SCI-Expanded) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 20
  • Basım Tarihi: 2023
  • Doi Numarası: 10.1109/lgrs.2023.3303643
  • Dergi Adı: IEEE Geoscience and Remote Sensing Letters
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Aquatic Science & Fisheries Abstracts (ASFA), Communication Abstracts, Compendex, Geobase, INSPEC, Metadex, Civil Engineering Abstracts
  • Anahtar Kelimeler: Agriculture, cotton, explainable artificial intelligence (XAI), explainable boosting machine (EBM), feature selection, glass-box method, interpretability, yield estimation
  • İstanbul Teknik Üniversitesi Adresli: Evet

Özet

Cotton is under the threat of climate and ecosystem change and has an essential role in the global textile industry. This makes its yield prediction essential for both economics and sustainability. The potential cotton yield can be predicted by integrating climatic factors, soil parameters, and biophysical parameters observed by high temporal and spatial resolution remote sensing satellites. This study used a multisource dataset to create an explainable and accurate predictive model for cotton yield prediction over the continental United States (CONUS). A recently proposed glass-box method called explainable boosting machine (EBM), which provides transparency, reliability, and ease of interpretation, was implemented. Accuracy performance was compared with the common machine learning (ML) methods for predicting cotton yields. The EBM showed higher accuracy against other glass-box methods and competitive results with black-box models. With the help of the EBM, the importance of individual features and their pairwise interactions was revealed without applying any post hoc methods. The study findings showed that precipitation (P), enhanced vegetation index (EVI), and leaf area index (LAI) are the three most important dynamic features. The dynamic features are the driver of the created model with 78% of the overall feature importance, followed by pairwise interactions of the features with 16% contribution. Finally, static features contribute 6% to the overall feature importance. The study highlights the importance of using multisource data and interactions of the input features and providing an interpretable model to understand the inner dynamics of cotton yield predictions.