Title :
Improving reliability of gene selection from microarray functional genomics data
Author :
Fu, Li M. ; Youn, Eun Seog
Author_Institution :
Univ. of Florida, Gainesville, FL, USA
Abstract :
Constructing a classifier based on microarray gene expression data has recently emerged as an important problem for cancer classification. Recent results have suggested the feasibility of constructing such a classifier with reasonable predictive accuracy under the circumstance where only a small number of cancer tissue samples of known type are available. Difficulty arises from the fact that each sample contains the expression data of a vast number of genes and these genes may interact with one another. Selection of a small number of critical genes is fundamental to correctly analyze the otherwise overwhelming data. It is essential to use a multivariate approach for capturing the correlated structure in the data. However, the curse of dimensionality leads to the concern about the reliability of selected genes. Here, we present a new gene selection method in which error and repeatability of selected genes are assessed within the context of M-fold cross-validation. In particular, we show that the method is able to identify source variables underlying data generation.
Keywords :
biology computing; cancer; data analysis; genetics; learning automata; medical computing; M-fold cross-validation; cancer classification; data generation; dimensionality; gene selection; microarray functional genomics data; microarray gene expression data; multivariate approach; support vector machine; Accuracy; Bioinformatics; Cancer; Data analysis; Gene expression; Genomics; Supervised learning; Support vector machine classification; Support vector machines; Unsupervised learning; Algorithms; Colonic Neoplasms; DNA, Neoplasm; Databases, Nucleic Acid; Gene Expression Profiling; Gene Expression Regulation, Neoplastic; Genomics; Humans; Leukemia; Oligonucleotide Array Sequence Analysis; Reproducibility of Results; Sensitivity and Specificity; Sequence Alignment; Sequence Analysis, DNA;
Journal_Title :
Information Technology in Biomedicine, IEEE Transactions on
DOI :
10.1109/TITB.2003.816558