Predicting Cost Impacts of Nonconformances in Construction Projects Using Interpretable Machine Learning

KOÇ K., BUDAYAN C., Ekmekcioğlu Ö., Tokdemir O. B.

Journal of Construction Engineering and Management, vol.150, no.1, 2024 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Volume: 150 Issue: 1
  • Publication Date: 2024
  • Doi Number: 10.1061/jcemd4.coeng-13857
  • Journal Name: Journal of Construction Engineering and Management
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, ICONDA Bibliographic, INSPEC, Metadex, Public Affairs Index, DIALNET, Civil Engineering Abstracts
  • Keywords: Cost of quality, Explainable artificial intelligence, Nonconformance (NCR), Quality failures, Tree-based ensemble model
  • Istanbul Technical University Affiliated: Yes


Nonconformance (NCR) has long been a subject of research interest for its potential to extrapolate information leading to a more productive environment in construction projects. Despite a variety of traditional attempts, a systematic understanding of how machine learning (ML) approaches can contribute to proactively detecting the severity of NCRs remains limited. This study aims to develop a data-driven ML framework to predict the cost impacts of NCRs (high severity versus low severity) in construction projects. To accomplish this aim, the random forest (RF) algorithm reinforced with a metaheuristic hyperparameter-tuning strategy, namely the gravitational search algorithm (GSA), is adopted for the binary classification problem. Furthermore, this study incorporates the Shapley additive explanations (SHAP) ensuring transparent interpretations into the GSA-RF predictive framework to tackle the inherent black-box nature of the ML rationale. The results reveal that the proposed model detects the severity of NCRs in terms of their cost impact with an overall AUROC value of 0.776 for the preseparated and blinded testing set. This indicates that the proposed model can be used confidently for newly introduced datasets from real-life cases. In addition, the SHAP analysis results emphasized the role of season, inadequate application procedure, and NCR type in detecting the severity of NCRs. Overall, this research not only makes an important contribution through its novel data-driven approaches but also provides insights for project managers concerning productivity improvements in the sector.