Title :
Developing an Effective Validation Strategy for Genetic Programming Models Based on Multiple Datasets
Author :
Liu, Yi ; Khoshgoftaar, Taghi ; Yao, Jenq-Foung
Author_Institution :
Georgia Coll. & State Univ., Milledgeville, GA
Abstract :
Genetic programming (GP) is a parallel searching technique where many solutions can be obtained simultaneously in the searching process. However, when applied to real-world classification tasks, some of the obtained solutions may have poor predictive performances. One of the reasons is that these solutions only match the shape of the training dataset, failing to learn and generalize the patterns hidden in the dataset. Therefore, unexpected poor results are obtained when the solutions are applied to the test dataset. This paper addresses how to remove the solutions which will have unacceptable performances on the test dataset. The proposed method in this paper applies a multi-dataset validation phase as a filter in GP-based classification tasks. By comparing our proposed method with a standard GP classifier based on the datasets from seven different NASA software projects, we demonstrate that the multi-dataset validation is effective, and can significantly improve the performance of GP-based software quality classification models
Keywords :
genetic algorithms; pattern classification; program verification; software quality; NASA software project; genetic programming; model selection; multidataset validation; paired t-tests; software metrics; software quality classification; Filters; Genetic programming; NASA; Pattern matching; Performance evaluation; Shape; Software performance; Software quality; Software standards; Testing; cost misclassification; genetic programming; model selection; multiple datasets; paired t-test; software metrics; software quality classification; validation;
Conference_Titel :
Information Reuse and Integration, 2006 IEEE International Conference on
Conference_Location :
Waikoloa Village, HI
Print_ISBN :
0-7803-9788-6
DOI :
10.1109/IRI.2006.252418