• DocumentCode
    3078942
  • Title

    An empirical investigation of filter attribute selection techniques for software quality classification

  • Author

    Gao, Kehan ; Khoshgoftaar, Taghi M. ; Wang, Huanjing

  • Author_Institution
    Eastern Connecticut State Univ., Willimantic, CT, USA
  • fYear
    2009
  • fDate
    10-12 Aug. 2009
  • Firstpage
    272
  • Lastpage
    277
  • Abstract
    Attribute selection is an important activity in data preprocessing for software quality modeling and other data mining problems. The software quality models have been used to improve the fault detection process. Finding faulty components in a software system during early stages of software development process can lead to a more reliable final product and can reduce development and maintenance costs. It has been shown in some studies that prediction accuracy of the models improves when irrelevant and redundant features are removed from the original data set. In this study, we investigated four filter attribute selection techniques, automatic hybrid search (AHS), rough sets (RS), Kolmogorov-Smirnov (KS) and probabilistic search (PS) and conducted the experiments by using them on a very large telecommunications software system. In order to evaluate their classification performance on the smaller subsets of attributes selected using different approaches, we built several classification models using five different classifiers. The empirical results demonstrated that by applying an attribution selection approach we can build classification models with an accuracy comparable to that built with a complete set of attributes. Moreover, the smaller subset of attributes has less than 15 percent of the complete set of attributes. Therefore, the metrics collection, model calibration, model validation, and model evaluation times of future software development efforts of similar systems can be significantly reduced. In addition, we demonstrated that our recently proposed attribute selection technique, KS, outperformed the other three attribute selection techniques.
  • Keywords
    data mining; rough set theory; software fault tolerance; software maintenance; software quality; Kolmogorov-Smirnov; automatic hybrid search; data mining problems; data preprocessing; fault detection process; filter attribute selection techniques; probabilistic search; rough sets; software development process; software quality classification; software quality modeling; Accuracy; Costs; Data mining; Data preprocessing; Fault detection; Filters; Maintenance; Programming; Software quality; Software systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Reuse & Integration, 2009. IRI '09. IEEE International Conference on
  • Conference_Location
    Las Vegas, NV
  • Print_ISBN
    978-1-4244-4114-3
  • Electronic_ISBN
    978-1-4244-4116-7
  • Type

    conf

  • DOI
    10.1109/IRI.2009.5211564
  • Filename
    5211564