DocumentCode :
952286
Title :
Prediction of Cancer Class with Majority Voting Genetic Programming Classifier Using Gene Expression Data
Author :
Paul, Topon Kumar ; Iba, Hitoshi
Author_Institution :
Syst. Eng. Lab., Toshiba Corp., Kawasaki
Volume :
6
Issue :
2
fYear :
2009
Firstpage :
353
Lastpage :
367
Abstract :
In order to get a better understanding of different types of cancers and to find the possible biomarkers for diseases, recently, many researchers are analyzing the gene expression data using various machine learning techniques. However, due to a very small number of training samples compared to the huge number of genes and class imbalance, most of these methods suffer from overfitting. In this paper, we present a majority voting genetic programming classifier (MVGPC) for the classification of microarray data. Instead of a single rule or a single set of rules, we evolve multiple rules with genetic programming (GP) and then apply those rules to test samples to determine their labels with majority voting technique. By performing experiments on four different public cancer data sets, including multiclass data sets, we have found that the test accuracies of MVGPC are better than those of other methods, including AdaBoost with GP. Moreover, some of the more frequently occurring genes in the classification rules are known to be associated with the types of cancers being studied in this paper.
Keywords :
cancer; data mining; feature extraction; genetic algorithms; genetics; learning (artificial intelligence); medical information systems; biomarkers; cancer class prediction; gene expression data; machine learning; majority voting genetic programming classifier; Classifier design and evaluation; Data mining; Evolutionary computing and genetic algorithms; Feature extraction or construction; data mining; evolutionary computing and genetic algorithm; feature extraction; gene expression; majority voting.; Algorithms; Artificial Intelligence; Databases, Genetic; Gene Expression Profiling; Humans; Models, Genetic; Neoplasms; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Software;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.70245
Filename :
4359894
Link To Document :
بازگشت