• DocumentCode
    1458703
  • Title

    A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity

  • Author

    Zhang, Shaohong ; Wong, Hau-San ; Shen, Ying ; Xie, Dongqing

  • Author_Institution
    Dept. of Comput. Sci., Guangzhou Univ., Guangzhou, China
  • Volume
    9
  • Issue
    4
  • fYear
    2012
  • Firstpage
    1257
  • Lastpage
    1263
  • Abstract
    Feature selection is widely established as one of the fundamental computational techniques in mining microarray data. Due to the lack of categorized information in practice, unsupervised feature selection is more practically important but correspondingly more difficult. Motivated by the cluster ensemble techniques, which combine multiple clustering solutions into a consensus solution of higher accuracy and stability, recent efforts in unsupervised feature selection proposed to use these consensus solutions as oracles. However, these methods are dependent on both the particular cluster ensemble algorithm used and the knowledge of the true cluster number. These methods will be unsuitable when the true cluster number is not available, which is common in practice. In view of the above problems, a new unsupervised feature ranking method is proposed to evaluate the importance of the features based on consensus affinity. Different from previous works, our method compares the corresponding affinity of each feature between a pair of instances based on the consensus matrix of clustering solutions. As a result, our method alleviates the need to know the true number of clusters and the dependence on particular cluster ensemble approaches as in previous works. Experiments on real gene expression data sets demonstrate significant improvement of the feature ranking results when compared to several state-of-the-art techniques.
  • Keywords
    biology computing; data mining; feature extraction; genetic algorithms; genetics; lab-on-a-chip; cluster ensemble techniques; consensus affinity; feature selection; fundamental computational techniques; gene expression data; microarray data mining; multiple clustering solutions; unsupervised feature ranking method; Bioinformatics; Clustering algorithms; Gene expression; Indexes; Laplace equations; Partitioning algorithms; Principal component analysis; Unsupervised feature ranking; cluster ensembles.; gene selection; Algorithms; Cluster Analysis; Computational Biology; Databases, Genetic; Gene Expression Profiling; Humans; Neoplasms; Oligonucleotide Array Sequence Analysis;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2012.34
  • Filename
    6158634