• DocumentCode
    2527367
  • Title

    Similarity and cluster analysis algorithms for microarrays using R* trees

  • Author

    Pi, Jiaxiong ; Shi, Yong ; Chen, Zhengxin

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Nebraska Univ., Omaha, NE, USA
  • fYear
    2005
  • fDate
    8-11 Aug. 2005
  • Firstpage
    91
  • Lastpage
    92
  • Abstract
    Similarity and cluster analysis are important aspects for analyzing microarray data. Based on our perspective of viewing microarrays as time series data, both similarity analysis and cluster analysis are carried out through indexing on time series data using R*-Trees. We have developed algorithms for similarity and cluster analysis on microarray data, and conducted experimental studies and comparative studies. First, our study shows that principle components analysis (PCA) has superiority over several other methods (such as DFT and PAA) as far as distance conservation is concerned. A similarity analysis tool based on PCA has been developed, which is able to explore less R*-Tree nodes before finding its similar counterparts and returns less false positives than other methods. In addition, we also extend R*-Tree´s application to cluster analysis. With the aid of R*-Tree indexing, two clustering algorithms. KMeans-R and Hierarchy-R, are proposed as an improved version of K-Means and hierarchical clustering, respectively. Experiments for similarity search and cluster analysis based on proposed algorithms have been carried out and have shown favorable results. Experiments related to yeast cell cycle dataset are reported in this paper.
  • Keywords
    arrays; biology computing; cellular biophysics; genetics; pattern clustering; principal component analysis; Hierarchy-R; K-Means; KMeans-R; PCA; R* tree; cluster analysis algorithm; clustering algorithm; distance conservation; hierarchical clustering; microarray data; principle components analysis; similarity analysis; time series data; yeast cell cycle dataset; Algorithm design and analysis; Clustering algorithms; Clustering methods; Data analysis; Fungi; Indexing; Information analysis; Information science; Principal component analysis; Time series analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE
  • Print_ISBN
    0-7695-2442-7
  • Type

    conf

  • DOI
    10.1109/CSBW.2005.125
  • Filename
    1540553