• DocumentCode
    2138636
  • Title

    A novel SPDF ensemble classifier for cancer classification

  • Author

    Chunying Zhang ; Fang Wu ; Tuopeng Tong ; Sun Chen ; Kai Song ; Min Ma ; Guangqiang Zheng

  • Author_Institution
    Sch. of Chem. Eng. & Technol., Tianjin Univ., Tianjin, China
  • fYear
    2013
  • fDate
    23-25 July 2013
  • Firstpage
    1037
  • Lastpage
    1041
  • Abstract
    To gain more accurate and reliable cancer classification results through DNA microarray analysis, a novel ensemble classifier SPDF (Subspace Partial least square based Decision Forest) is developed. The original data are split into subspaces by column. For each subspace, partial least square (PLS) is applied to extract orthogonal latent variables (LVs). In conjunction with the Minimal Redundancy and Maximal Relevance (MRMR) as the gene-selection preprocessing method, the adverse effect of the too high dimensional variables with too small samples could be overcome successfully. Then, all available LVs are aggregated as the new training data where the Decision Forest is trained for classification. Therefore relying on the feature extraction power of PLS and the orthogonality of LVs, the multi-colinearity and high noise inherent in microarray data could be eliminated effectively. Moreover, the Decision Forest could enhance the data variety and further lighten the dependence of the classification results to the given data. The applications to two microarray datasets show that compared with Rotation Forest, Bagging and Boosting, the new SPDF method yields consistently accurate and robust predictive performance, with the maximal improvement reaching 7.26% in terms of classification accuracy on the Colon cancer classification.
  • Keywords
    cancer; decision trees; feature extraction; genetics; lab-on-a-chip; least squares approximations; medical computing; pattern classification; DNA microarray analysis; MRMR; PLS; SPDF ensemble classifier; colon cancer classification; data variety; ensemble classifier SPDF; feature extraction power; gene-selection preprocessing method; microarray datasets; minimal redundancy and maximal relevance; multicolinearity; orthogonal latent variable extraction; subspace partial least square based decision forest; training data; Accuracy; Bagging; Boosting; Cancer; Colon; Decision trees; Input variables; cancer classification; ensemble classifier; feature extraction; microarray data analysis; partial least squares;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Natural Computation (ICNC), 2013 Ninth International Conference on
  • Conference_Location
    Shenyang
  • Type

    conf

  • DOI
    10.1109/ICNC.2013.6818129
  • Filename
    6818129