DocumentCode :
2347126
Title :
Identifying security bug reports via text mining: An industrial case study
Author :
Gegick, Michael ; Rotella, Pete ; Xie, Tao
fYear :
2010
fDate :
2-3 May 2010
Firstpage :
11
Lastpage :
20
Abstract :
A bug-tracking system such as Bugzilla contains bug reports (BRs) collected from various sources such as development teams, testing teams, and end users. When bug reporters submit bug reports to a bug-tracking system, the bug reporters need to label the bug reports as security bug reports (SBRs) or not, to indicate whether the involved bugs are security problems. These SBRs generally deserve higher priority in bug fixing than not-security bug reports (NSBRs). However, in the bug-reporting process, bug reporters often mislabel SBRs as NSBRs partly due to lack of security domain knowledge. This mislabeling could cause serious damage to software-system stakeholders due to the induced delay of identifying and fixing the involved security bugs. To address this important issue, we developed a new approach that applies text mining on natural-language descriptions of BRs to train a statistical model on already manually-labeled BRs to identify SBRs that are manually-mislabeled as NSBRs. Security engineers can use the model to automate the classification of BRs from large bug databases to reduce the time that they spend on searching for SBRs. We evaluated the model´s predictions on a large Cisco software system with over ten million source lines of code. Among a sample of BRs that Cisco bug reporters manually labeled as NSBRs in bug reporting, our model successfully classified a high percentage (78%) of the SBRs as verified by Cisco security engineers, and predicted their classification as SBRs with a probability of at least 0.98.
Keywords :
data mining; program debugging; security of data; statistical analysis; text analysis; Bugzilla; Cisco security engineers; Cisco software system; bug-tracking system; development teams; end users; industrial case study; natural language descriptions; security bug reports; software system stakeholders; statistical model; testing teams; text mining; Computer bugs; Data engineering; Data security; Databases; Delay; Mining industry; Predictive models; Software systems; System testing; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Mining Software Repositories (MSR), 2010 7th IEEE Working Conference on
Conference_Location :
Cape Town
Print_ISBN :
978-1-4244-6802-7
Electronic_ISBN :
978-1-4244-6803-4
Type :
conf
DOI :
10.1109/MSR.2010.5463340
Filename :
5463340
Link To Document :
بازگشت