Title :
A novel SPDF ensemble classifier for cancer classification
Author :
Chunying Zhang ; Fang Wu ; Tuopeng Tong ; Sun Chen ; Kai Song ; Min Ma ; Guangqiang Zheng
Author_Institution :
Sch. of Chem. Eng. & Technol., Tianjin Univ., Tianjin, China
Abstract :
To gain more accurate and reliable cancer classification results through DNA microarray analysis, a novel ensemble classifier SPDF (Subspace Partial least square based Decision Forest) is developed. The original data are split into subspaces by column. For each subspace, partial least square (PLS) is applied to extract orthogonal latent variables (LVs). In conjunction with the Minimal Redundancy and Maximal Relevance (MRMR) as the gene-selection preprocessing method, the adverse effect of the too high dimensional variables with too small samples could be overcome successfully. Then, all available LVs are aggregated as the new training data where the Decision Forest is trained for classification. Therefore relying on the feature extraction power of PLS and the orthogonality of LVs, the multi-colinearity and high noise inherent in microarray data could be eliminated effectively. Moreover, the Decision Forest could enhance the data variety and further lighten the dependence of the classification results to the given data. The applications to two microarray datasets show that compared with Rotation Forest, Bagging and Boosting, the new SPDF method yields consistently accurate and robust predictive performance, with the maximal improvement reaching 7.26% in terms of classification accuracy on the Colon cancer classification.
Keywords :
cancer; decision trees; feature extraction; genetics; lab-on-a-chip; least squares approximations; medical computing; pattern classification; DNA microarray analysis; MRMR; PLS; SPDF ensemble classifier; colon cancer classification; data variety; ensemble classifier SPDF; feature extraction power; gene-selection preprocessing method; microarray datasets; minimal redundancy and maximal relevance; multicolinearity; orthogonal latent variable extraction; subspace partial least square based decision forest; training data; Accuracy; Bagging; Boosting; Cancer; Colon; Decision trees; Input variables; cancer classification; ensemble classifier; feature extraction; microarray data analysis; partial least squares;
Conference_Titel :
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location :
Shenyang
DOI :
10.1109/ICNC.2013.6818129