DocumentCode :
589294
Title :
A Comparative Study on the Stability of Software Metric Selection Techniques
Author :
Huanjing Wang ; Khoshgoftaar, Taghi M. ; Wald, Randall ; Napolitano, Antonio
Volume :
2
fYear :
2012
fDate :
12-15 Dec. 2012
Firstpage :
301
Lastpage :
307
Abstract :
In large software projects, software quality prediction is an important aspect of the development cycle to help focus quality assurance efforts on the modules most likely to contain faults. To perform software quality prediction, various software metrics are collected during the software development cycle, and models are built using these metrics. However, not all features (metrics) make the same contribution to the class attribute (e.g., faulty/not faulty). Thus, selecting a subset of metrics that are relevant to the class attribute is a critical step. As many feature selection algorithms exist, it is important to find ones which will produce consistent results even as the underlying data is changed, this quality of producing consistent results is referred to as "stability." In this paper, we investigate the stability of seven feature selection techniques in the context of software quality classification. We compare four approaches for varying the underlying data to evaluate stability: the traditional approach of generating many sub samples of the original data and comparing the features selected from each, an earlier approach developed by our research group which compares the features selected from sub samples of the data with those selected from the original, and two newly-proposed approaches based on comparing two sub samples which are specifically designed to have same number of instances and a specified level of overlap, with one of these new approaches comparing within each pair while the other compares the generated sub samples with the original dataset. The empirical validation is carried out on sixteen software metrics datasets. Our results show that ReliefF is the most stable feature selection technique. Results also show that the level of overlap, degree of perturbation, and feature subset size do affect the stability of feature selection methods. Finally, we find that all four approaches of evaluating stability produce similar results in terms of which- feature selection techniques are best under different circumstances.
Keywords :
pattern classification; set theory; software metrics; software quality; software reliability; ReliefF technique; data subsamples; faulty class attribute; feature subset size; metric subset selection; notfaulty class attribute; overlap level; perturbation degree; quality assurance; software development cycle; software metric feature selection technique stability; software projects; software quality classification; software quality prediction; Partitioning algorithms; Radio frequency; Software metrics; Software quality; Stability analysis; feature selection; stability; subsample;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Applications (ICMLA), 2012 11th International Conference on
Conference_Location :
Boca Raton, FL
Print_ISBN :
978-1-4673-4651-1
Type :
conf
DOI :
10.1109/ICMLA.2012.142
Filename :
6406712
Link To Document :
بازگشت