Predicting Software Vulnerabilities Using Topic Modeling with Issues

Bulut F. G. , ALTUNEL H., TOSUN A.

4th International Conference on Computer Science and Engineering, 11 - 15 September 2019 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/ubmk.2019.8907170
  • Keywords: software vulnerability prediction, topic modeling, bug report, issue record, textual description


The existence of software vulnerabilities is an indicator of the reliability and safety of software products. Software vulnerabilities can be predicted using metrics derived from developers, organization, code and textual data. In this work, we aim to predict the software vulnerabilities using issue records in two different datasets. The first dataset consists of six-months of issue records collected in a corporate, whereas the second dataset consists of Wireshark project bug records from 2017 to 2018. Prediction models were established using six different machine learning for which textual descriptions of issue records were converted into topic models. A regression model was established for the corporate company in which textual description of issue records were used as the input, and the number of vulnerabilities were used as the output of the model. A classification model was established for Wireshark dataset in which textual descriptions of bug records were used as input of the model, and the class of vulnerable-prone or not is used as the output. The best regression model results are 0.23, 0.30, 0.44 MdMRE values, respectively. The best classification model result is 74% recall score.