DocumentCode :
863552
Title :
f -Information Measures for Efficient Selection of Discriminative Genes From Microarray Data
Author :
Maji, Pradipta
Author_Institution :
Machine Intell. Unit, Indian Stat. Inst., Kolkata
Volume :
56
Issue :
4
fYear :
2009
fDate :
4/1/2009 12:00:00 AM
Firstpage :
1063
Lastpage :
1069
Abstract :
Among the great amount of genes presented in microarray gene expression data, only a small fraction is effective for performing a certain diagnostic test. In this regard, mutual information has been shown to be successful for selecting a set of relevant and nonredundant genes from microarray data. However, information theory offers many more measures such as the f-information measures that may be suitable for selection of genes from microarray gene expression data. This paper presents different f-information measures as the evaluation criteria for gene selection problem. To compute the gene-gene redundancy (respectively, gene-class relevance), these information measures calculate the divergence of the joint distribution of two genes´ expression values (respectively, the expression values of a gene and the class labels of samples) from the joint distribution when two genes (respectively, the gene and class label) are considered to be completely independent. The performance of different f-information measures is compared with that of the mutual information based on the predictive accuracy of naive Bayes classifier, K -nearest neighbor rule, and support vector machine. An important finding is that some f-information measures are shown to be effective for selecting relevant and nonredundant genes from microarray data. The effectiveness of different f-information measures, along with a comparison with mutual information, is demonstrated on breast cancer, leukemia, and colon cancer datasets. While some f -information measures provide 100% prediction accuracy for all three microarray datasets, mutual information attains this accuracy only for breast cancer dataset, and 98.6% and 93.6% for leukemia and colon cancer datasets, respectively.
Keywords :
Bayes methods; bioinformatics; cancer; feature extraction; genetics; information theory; molecular biophysics; support vector machines; K-nearest neighbor rule; breast cancer; colon cancer; f-information measure; gene selection problem; gene-class relevance; gene-gene redundancy; information theory; leukemia; microarray gene expression data; mutual information; naive Bayes classifier; support vector machine; Accuracy; Breast cancer; Colon; Distributed computing; Gene expression; Genetic communication; Information theory; Mutual information; Performance evaluation; Testing; Classification; feature selection; gene selection; microarray analysis; mutual information; Gene Expression Profiling; Models, Genetic; Oligonucleotide Array Sequence Analysis;
fLanguage :
English
Journal_Title :
Biomedical Engineering, IEEE Transactions on
Publisher :
ieee
ISSN :
0018-9294
Type :
jour
DOI :
10.1109/TBME.2008.2004502
Filename :
4625953
Link To Document :
بازگشت