• DocumentCode
    668563
  • Title

    Multi_level data pre_processing for software defect prediction

  • Author

    Armah, Gabriel Kofi ; Guangchun Luo ; Ke Qin

  • Author_Institution
    Comput. Sci. Dept., Univ. for Dev. Studies (UDS), Navrongo, Ghana
  • Volume
    2
  • fYear
    2013
  • fDate
    23-24 Nov. 2013
  • Firstpage
    170
  • Lastpage
    174
  • Abstract
    Early detection of defective software components enables verification experts give much time and allocate scare resources to the problem areas of the system under development. This is the usefulness of defect prediction; defect prediction streamline testing efforts and reduce the development cost of software when as stated above it is detected at the early stages. An important step to building effective predictive models is to apply one or more sampling techniques. A model is claimed to be effective if it is able to correctly classify defective and non-defective modules as accurately as possible. In this paper we considered the outcome of data preprocessing by filtering and compared the performance with non-pre-processing original dataset. We compared the performance of the four different K-Nearest Neighbor(KNN-LWL, Kstar, IBK, IB1 classifiers) with Non Nested Generalized Exemplars (NNGE), Random Tree and Random Forest. We observed that our Multi-level data pre-processing; which includes double attribute selection and tripartite instance filtering enhanced the defect prediction results. We also observed that these two filtering methods improved performance of the prediction results independently; by using attribute selection only and resampling filtering. The excellent performance achieved could be attributed to the removal of irrelevant attributes by dimension reduction and Resampling also handled the problem of class imbalanced. These together led to the improved performance competences of the classifiers considered. NNGE as its name implies avoided generalization of some of the datasets; those with instances above 2,000; (JM1=10,885 and KC1=2,109) using pre-processing, this may be due to conflicting instances. We also used Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) measures to check the effectiveness of our model.
  • Keywords
    data reduction; information filtering; learning (artificial intelligence); pattern classification; program testing; sampling methods; software quality; trees (mathematics); IB1 classifiers; IBK; KNN-LWL; Kstar; MAE; NNGE; RMSE measures; defective module classification; defective software component early detection; dimension reduction; double attribute selection; k-nearest neighbor classification; mean absolute error; multilevel data preprocessing; nondefective module classification; nonnested generalized exemplars; random forest; random tree; resampling filtering; root mean squared error; sampling techniques; scare resource allocation; software defect prediction streamline testing; software development cost reduction; software quality; tripartite instance filtering; Accuracy; Filtering; Measurement; Predictive models; Software; Software engineering; Training; ROC; classifiers; instances; multi-level data; pre-processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Management, Innovation Management and Industrial Engineering (ICIII), 2013 6th International Conference on
  • Conference_Location
    Xi´an
  • Print_ISBN
    978-1-4799-3985-5
  • Type

    conf

  • DOI
    10.1109/ICIII.2013.6703111
  • Filename
    6703111