Title :
On mining data across software repositories
Author :
Anbalagan, Prasanth ; Vouk, Mladen
Author_Institution :
Dept. of Comput. Sci., North Carolina State Univ., Raleigh, NC
Abstract :
Software repositories provide abundance of valuable information about open source projects. With the increase in the size of the data maintained by the repositories, automated extraction of such data from individual repositories, as well as of linked information across repositories, has become a necessity. In this paper we describe a framework that uses web scraping to automatically mine repositories and link information across repositories. We discuss two implementations of the framework. In the first implementation, we automatically identify and collect security problem reports from project repositories that deploy the Bugzilla bug tracker using related vulnerability information from the National Vulnerability Database. In the second, we collect security problem reports for projects that deploy the Launchpad bug tracker along with related vulnerability information from the National Vulnerability Database. We have evaluated our tool on various releases of Fedora, Ubuntu, Suse, RedHat, and Firefox projects. The percentage of security bugs identified using our tool is consistent with that reported by other researchers.
Keywords :
Internet; data mining; program debugging; public domain software; Bugzilla bug tracker; Launchpad bug tracker; Web scraping; data mining; open source projects; software repositories; Computer bugs; Computer science; Data mining; Data security; Databases; Government; Information retrieval; Information security; National security; Open source software;
Conference_Titel :
Mining Software Repositories, 2009. MSR '09. 6th IEEE International Working Conference on
Conference_Location :
Vancouver, BC
Print_ISBN :
978-1-4244-3493-0
DOI :
10.1109/MSR.2009.5069498