$f$ -Information Measures for Efficient Selection of Discriminative Genes From Microarray Data

Author

Maji, Pradipta

Author_Institution

Machine Intell. Unit, Indian Stat. Inst., Kolkata

Volume

56

Issue

4

fYear

2009

fDate

4/1/2009 12:00:00 AM

Firstpage

1063

Lastpage

1069

Abstract

Among the great amount of genes presented in microarray gene expression data, only a small fraction is effective for performing a certain diagnostic test. In this regard, mutual information has been shown to be successful for selecting a set of relevant and nonredundant genes from microarray data. However, information theory offers many more measures such as the f-information measures that may be suitable for selection of genes from microarray gene expression data. This paper presents different f-information measures as the evaluation criteria for gene selection problem. To compute the gene-gene redundancy (respectively, gene-class relevance), these information measures calculate the divergence of the joint distribution of two genes´ expression values (respectively, the expression values of a gene and the class labels of samples) from the joint distribution when two genes (respectively, the gene and class label) are considered to be completely independent. The performance of different f-information measures is compared with that of the mutual information based on the predictive accuracy of naive Bayes classifier, K -nearest neighbor rule, and support vector machine. An important finding is that some f-information measures are shown to be effective for selecting relevant and nonredundant genes from microarray data. The effectiveness of different f-information measures, along with a comparison with mutual information, is demonstrated on breast cancer, leukemia, and colon cancer datasets. While some f -information measures provide 100% prediction accuracy for all three microarray datasets, mutual information attains this accuracy only for breast cancer dataset, and 98.6% and 93.6% for leukemia and colon cancer datasets, respectively.

Keywords

Bayes methods; bioinformatics; cancer; feature extraction; genetics; information theory; molecular biophysics; support vector machines; K-nearest neighbor rule; breast cancer; colon cancer; f-information measure; gene selection problem; gene-class relevance; gene-gene redundancy; information theory; leukemia; microarray gene expression data; mutual information; naive Bayes classifier; support vector machine; Accuracy; Breast cancer; Colon; Distributed computing; Gene expression; Genetic communication; Information theory; Mutual information; Performance evaluation; Testing; Classification; feature selection; gene selection; microarray analysis; mutual information; Gene Expression Profiling; Models, Genetic; Oligonucleotide Array Sequence Analysis;

fLanguage

English

Journal_Title

Biomedical Engineering, IEEE Transactions on

Publisher

ieee

ISSN

0018-9294

Type

jour

DOI

10.1109/TBME.2008.2004502

Filename

4625953