• DocumentCode
    2225519
  • Title

    A hybrid algorithm of minimum spanning tree and nearest neighbor for classifying human cancers

  • Author

    Zhou, Chunbao ; Wan, Liming ; Liang, Yanchun

  • Author_Institution
    Key Lab. of Symbol Comput. & Knowledge Eng. of the Minist. of Educ., Jilin Univ., Changchun, China
  • Volume
    5
  • fYear
    2010
  • fDate
    20-22 Aug. 2010
  • Abstract
    Classification and prediction of different cancers based on gene expression profiles are important for cancer diagnosis, cancer treatment and medication discovery. The k nearest neighbor algorithm (k-NN) is one easy and efficient machine learning method for cancer classification and the parameter k is crucial. In this paper, we integrate minimum spanning tree (MST) and k nearest neighbor algorithm (k-NN) for cancer classification. The MST is designed for the selection of parameter k and the nearest neighbors for k-NN. Firstly we build a minimum spanning tree (MST) based on Euclidean distance between each two samples for gene expression data only including one unknown class sample. Secondly for unknown class sample in the gene expression data, we find the connected samples and then apply majority vote principle. Thirdly if there are tied votes then we expend the connected samples with the nearest neighbors for unknown class sample until all the samples are expended or the class for unknown sample is obtained. This hybrid algorithm is referred to as MSTNN. The hybrid algorithm MSTNN is compared with k-NN and other 3 existing classification algorithms on CNS dataset, Colon dataset and Lung dataset, 3 binary class gene expression datasets and 3 multi-class gene expression datasets: Leukemia1, Leukemia2, and Leukemia3 involving human cancers. The MSTNN algorithm improves 5.65% better than k-NN on average LOOCV accuracy and 13.80% better than k-NN on testing datasets classification average accuracy, and achieves the best performance in all the 5 algorithms. The results demonstrate that the proposed MSTNN algorithm is feasible to classify human cancers.
  • Keywords
    cancer; genetics; medical diagnostic computing; patient diagnosis; patient treatment; pattern classification; trees (mathematics); Euclidean distance; cancer diagnosis; cancer treatment; gene expression dataset; human cancer classification; hybrid algorithm; medication discovery; minimum spanning tree; nearest neighbor algorithm; Artificial neural networks; Cancer; Classification algorithms; Colon; Lungs; Niobium; Support vector machines; cancer classification; gene expression profile; k-nearest-neighbor; minimum spanning tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Advanced Computer Theory and Engineering (ICACTE), 2010 3rd International Conference on
  • Conference_Location
    Chengdu
  • ISSN
    2154-7491
  • Print_ISBN
    978-1-4244-6539-2
  • Type

    conf

  • DOI
    10.1109/ICACTE.2010.5579426
  • Filename
    5579426