DocumentCode :
2138636
Title :
A novel SPDF ensemble classifier for cancer classification
Author :
Chunying Zhang ; Fang Wu ; Tuopeng Tong ; Sun Chen ; Kai Song ; Min Ma ; Guangqiang Zheng
Author_Institution :
Sch. of Chem. Eng. & Technol., Tianjin Univ., Tianjin, China
fYear :
2013
fDate :
23-25 July 2013
Firstpage :
1037
Lastpage :
1041
Abstract :
To gain more accurate and reliable cancer classification results through DNA microarray analysis, a novel ensemble classifier SPDF (Subspace Partial least square based Decision Forest) is developed. The original data are split into subspaces by column. For each subspace, partial least square (PLS) is applied to extract orthogonal latent variables (LVs). In conjunction with the Minimal Redundancy and Maximal Relevance (MRMR) as the gene-selection preprocessing method, the adverse effect of the too high dimensional variables with too small samples could be overcome successfully. Then, all available LVs are aggregated as the new training data where the Decision Forest is trained for classification. Therefore relying on the feature extraction power of PLS and the orthogonality of LVs, the multi-colinearity and high noise inherent in microarray data could be eliminated effectively. Moreover, the Decision Forest could enhance the data variety and further lighten the dependence of the classification results to the given data. The applications to two microarray datasets show that compared with Rotation Forest, Bagging and Boosting, the new SPDF method yields consistently accurate and robust predictive performance, with the maximal improvement reaching 7.26% in terms of classification accuracy on the Colon cancer classification.
Keywords :
cancer; decision trees; feature extraction; genetics; lab-on-a-chip; least squares approximations; medical computing; pattern classification; DNA microarray analysis; MRMR; PLS; SPDF ensemble classifier; colon cancer classification; data variety; ensemble classifier SPDF; feature extraction power; gene-selection preprocessing method; microarray datasets; minimal redundancy and maximal relevance; multicolinearity; orthogonal latent variable extraction; subspace partial least square based decision forest; training data; Accuracy; Bagging; Boosting; Cancer; Colon; Decision trees; Input variables; cancer classification; ensemble classifier; feature extraction; microarray data analysis; partial least squares;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location :
Shenyang
Type :
conf
DOI :
10.1109/ICNC.2013.6818129
Filename :
6818129
Link To Document :
بازگشت