DocumentCode
2138636
Title
A novel SPDF ensemble classifier for cancer classification
Author
Chunying Zhang ; Fang Wu ; Tuopeng Tong ; Sun Chen ; Kai Song ; Min Ma ; Guangqiang Zheng
Author_Institution
Sch. of Chem. Eng. & Technol., Tianjin Univ., Tianjin, China
fYear
2013
fDate
23-25 July 2013
Firstpage
1037
Lastpage
1041
Abstract
To gain more accurate and reliable cancer classification results through DNA microarray analysis, a novel ensemble classifier SPDF (Subspace Partial least square based Decision Forest) is developed. The original data are split into subspaces by column. For each subspace, partial least square (PLS) is applied to extract orthogonal latent variables (LVs). In conjunction with the Minimal Redundancy and Maximal Relevance (MRMR) as the gene-selection preprocessing method, the adverse effect of the too high dimensional variables with too small samples could be overcome successfully. Then, all available LVs are aggregated as the new training data where the Decision Forest is trained for classification. Therefore relying on the feature extraction power of PLS and the orthogonality of LVs, the multi-colinearity and high noise inherent in microarray data could be eliminated effectively. Moreover, the Decision Forest could enhance the data variety and further lighten the dependence of the classification results to the given data. The applications to two microarray datasets show that compared with Rotation Forest, Bagging and Boosting, the new SPDF method yields consistently accurate and robust predictive performance, with the maximal improvement reaching 7.26% in terms of classification accuracy on the Colon cancer classification.
Keywords
cancer; decision trees; feature extraction; genetics; lab-on-a-chip; least squares approximations; medical computing; pattern classification; DNA microarray analysis; MRMR; PLS; SPDF ensemble classifier; colon cancer classification; data variety; ensemble classifier SPDF; feature extraction power; gene-selection preprocessing method; microarray datasets; minimal redundancy and maximal relevance; multicolinearity; orthogonal latent variable extraction; subspace partial least square based decision forest; training data; Accuracy; Bagging; Boosting; Cancer; Colon; Decision trees; Input variables; cancer classification; ensemble classifier; feature extraction; microarray data analysis; partial least squares;
fLanguage
English
Publisher
ieee
Conference_Titel
Natural Computation (ICNC), 2013 Ninth International Conference on
Conference_Location
Shenyang
Type
conf
DOI
10.1109/ICNC.2013.6818129
Filename
6818129
Link To Document