Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities
Software security failures are common and the problem is growing. A vulnerability is a weakness in the software that, when exploited, causes a security failure. It is difficult to detect vulnerabilities until they manifest themselves as security failures in the operational stage of software, because security concerns are often not addressed or known sufficiently early during the software development life cycle. Numerous studies have shown that complexity, coupling, and cohesion (CCC) related structural metrics are important indicators of the quality of software architecture, and software architecture is one of the most important and early design decisions that influences the final quality of the software system. Although these metrics have been successfully employed to indicate software faults in general, there are no systematic guidelines on how to use these metrics to predict vulnerabilities in software. If CCC metrics can be used to indicate vulnerabilities, these metrics could aid in the conception of more secured architecture, leading to more secured design and code and eventually better software. In this paper, we present a framework to automatically predict vulnerabilities based on CCC metrics. To empirically validate the framework and prediction accuracy, we conduct a large empirical study on fifty-two releases of Mozilla Firefox developed over a period of four years. To build vulnerability predictors, we consider four alternative data mining and statistical techniques – C4.5 Decision Tree, Random Forests, Logistic Regression, and Naïve-Bayes – and compare their prediction performances. We are able to correctly predict majority of the vulnerability-prone files in Mozilla Firefox, with tolerable false positive rates. Moreover, the predictors built from the past releases can reliably predict the likelihood of having vulnerabilities in the future releases. The experimental results indicate that structural information from the non-security realm such as complexity, coupling, and cohesion are useful in vulnerability prediction.