Multi_level data pre_processing for software defect prediction

Author

Armah, Gabriel Kofi ; Guangchun Luo ; Ke Qin

Author_Institution

Comput. Sci. Dept., Univ. for Dev. Studies (UDS), Navrongo, Ghana

Volume

fYear

2013

fDate

23-24 Nov. 2013

Firstpage

170

Lastpage

174

Abstract

Early detection of defective software components enables verification experts give much time and allocate scare resources to the problem areas of the system under development. This is the usefulness of defect prediction; defect prediction streamline testing efforts and reduce the development cost of software when as stated above it is detected at the early stages. An important step to building effective predictive models is to apply one or more sampling techniques. A model is claimed to be effective if it is able to correctly classify defective and non-defective modules as accurately as possible. In this paper we considered the outcome of data preprocessing by filtering and compared the performance with non-pre-processing original dataset. We compared the performance of the four different K-Nearest Neighbor(KNN-LWL, Kstar, IBK, IB1 classifiers) with Non Nested Generalized Exemplars (NNGE), Random Tree and Random Forest. We observed that our Multi-level data pre-processing; which includes double attribute selection and tripartite instance filtering enhanced the defect prediction results. We also observed that these two filtering methods improved performance of the prediction results independently; by using attribute selection only and resampling filtering. The excellent performance achieved could be attributed to the removal of irrelevant attributes by dimension reduction and Resampling also handled the problem of class imbalanced. These together led to the improved performance competences of the classifiers considered. NNGE as its name implies avoided generalization of some of the datasets; those with instances above 2,000; (JM1=10,885 and KC1=2,109) using pre-processing, this may be due to conflicting instances. We also used Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) measures to check the effectiveness of our model.

Keywords

data reduction; information filtering; learning (artificial intelligence); pattern classification; program testing; sampling methods; software quality; trees (mathematics); IB1 classifiers; IBK; KNN-LWL; Kstar; MAE; NNGE; RMSE measures; defective module classification; defective software component early detection; dimension reduction; double attribute selection; k-nearest neighbor classification; mean absolute error; multilevel data preprocessing; nondefective module classification; nonnested generalized exemplars; random forest; random tree; resampling filtering; root mean squared error; sampling techniques; scare resource allocation; software defect prediction streamline testing; software development cost reduction; software quality; tripartite instance filtering; Accuracy; Filtering; Measurement; Predictive models; Software; Software engineering; Training; ROC; classifiers; instances; multi-level data; pre-processing;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Management, Innovation Management and Industrial Engineering (ICIII), 2013 6th International Conference on

Conference_Location

Xi´an

Print_ISBN

978-1-4799-3985-5

Type

conf

DOI

10.1109/ICIII.2013.6703111

Filename

6703111

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=668563