An industrial case study of classifier ensembles for locating software defects

Misirli A. , Bener A. B. , Turhan B.

SOFTWARE QUALITY JOURNAL, vol.19, no.3, pp.515-536, 2011 (Journal Indexed in SCI) identifier identifier

  • Publication Type: Article / Article
  • Volume: 19 Issue: 3
  • Publication Date: 2011
  • Doi Number: 10.1007/s11219-010-9128-1
  • Page Numbers: pp.515-536


As the application layer in embedded systems dominates over the hardware, ensuring software quality becomes a real challenge. Software testing is the most time-consuming and costly project phase, specifically in the embedded software domain. Misclassifying a safe code as defective increases the cost of projects, and hence leads to low margins. In this research, we present a defect prediction model based on an ensemble of classifiers. We have collaborated with an industrial partner from the embedded systems domain. We use our generic defect prediction models with data coming from embedded projects. The embedded systems domain is similar to mission critical software so that the goal is to catch as many defects as possible. Therefore, the expectation from a predictor is to get very high probability of detection (pd). On the other hand, most embedded systems in practice are commercial products, and companies would like to lower their costs to remain competitive in their market by keeping their false alarm (pf) rates as low as possible and improving their precision rates. In our experiments, we used data collected from our industry partners as well as publicly available data. Our results reveal that ensemble of classifiers significantly decreases pf down to 15% while increasing precision by 43% and hence, keeping balance rates at 74%. The cost-benefit analysis of the proposed model shows that it is enough to inspect 23% of the code on local datasets to detect around 70% of defects.