DocumentCode :
1092683
Title :
Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis
Author :
Tang, Yuchun ; Zhang, Yan-Qing ; Huang, Zhen
Volume :
4
Issue :
3
fYear :
2007
Firstpage :
365
Lastpage :
381
Abstract :
Extracting a subset of informative genes from microarray expression data is a critical data preparation step in cancer classification and other biological function analyses. Though many algorithms have been developed, the Support Vector Machine - Recursive Feature Elimination (SVM-RFE) algorithm is one of the best gene feature selection algorithms. It assumes that a smaller "filter-out" factor in the SVM-RFE, which results in a smaller number of gene features eliminated in each recursion, should lead to extraction of a better gene subset. Because the SVM-RFE is highly sensitive to the "filter-out" factor, our simulations have shown that this assumption is not always correct and that the SVM-RFE is an unstable algorithm. To select a set of key gene features for reliable prediction of cancer types or subtypes and other applications, a new two-stage SVM-RFE algorithm has been developed. It is designed to effectively eliminate most of the irrelevant, redundant and noisy genes while keeping information loss small at the first stage. A fine selection for the final gene subset is then performed at the second stage. The two-stage SVM-RFE overcomes the instability problem of the SVM-RFE to achieve better algorithm utility. We have demonstrated that the two-stage SVM-RFE is significantly more accurate and more reliable than the SVM-RFE and three correlation-based methods based on our analysis of three publicly available microarray expression datasets. Furthermore, the two-stage SVM-RFE is computationally efficient because its time complexity is $O(d * log{_2d})$, where $d$ is the size of the original gene set.
Keywords :
Bioinformatics; Cancer; DNA; Data analysis; Data mining; Gene expression; Genomics; Organisms; Support vector machine classification; Support vector machines; Bioinformatics; Cancer Classification; Feature Selection; Gene Selection; Microarray Gene Expression Data Analysis; Recursive Feature Elimination; Support Vector Machines; Algorithms; Artificial Intelligence; Diagnosis, Computer-Assisted; Gene Expression Profiling; Humans; Neoplasm Proteins; Neoplasms; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Tumor Markers, Biological;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2007.70224
Filename :
4288063
Link To Document :
بازگشت