Title :
Relevant and Significant Supervised Gene Clusters for Microarray Cancer Classification
Author :
Maji, Pradipta ; Das, Chandra
Author_Institution :
Machine Intell. Unit, Indian Stat. Inst., Kolkata, India
fDate :
6/1/2012 12:00:00 AM
Abstract :
An important application of microarray data in functional genomics is to classify samples according to their gene expression profiles such as to classify cancer versus normal samples or to classify different types or subtypes of cancer. One of the major tasks with gene expression data is to find co-regulated gene groups whose collective expression is strongly associated with sample categories. In this regard, a gene clustering algorithm is proposed to group genes from microarray data. It directly incorporates the information of sample categories in the grouping process for finding groups of co-regulated genes with strong association to the sample categories, yielding a supervised gene clustering algorithm. The average expression of the genes from each cluster acts as its representative. Some significant representatives are taken to form the reduced feature set to build the classifiers for cancer classification. The mutual information is used to compute both gene-gene redundancy and gene-class relevance. The performance of the proposed method, along with a comparison with existing methods, is studied on six cancer microarray data sets using the predictive accuracy of naive Bayes classifier, K-nearest neighbor rule, and support vector machine. An important finding is that the proposed algorithm is shown to be effective for identifying biologically significant gene clusters with excellent predictive capability.
Keywords :
Bayes methods; cancer; genetics; genomics; medical diagnostic computing; support vector machines; K-nearest neighbor rule; functional genomics; gene clustering algorithm; gene expression profiles; microarray cancer classification; mutual information; naive Bayes classifier; supervised gene clusters; support vector machine; Accuracy; Cancer; Clustering algorithms; Gene expression; Mutual information; Support vector machines; Training; Classification; feature selection; gene clustering; microarray analysis; mutual information; Bayes Theorem; Cluster Analysis; Computational Biology; Female; Gene Expression Profiling; Genes, Neoplasm; Humans; Male; Multigene Family; Neoplasms; Oligonucleotide Array Sequence Analysis; Support Vector Machines;
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2012.2193590