DocumentCode
2527367
Title
Similarity and cluster analysis algorithms for microarrays using R* trees
Author
Pi, Jiaxiong ; Shi, Yong ; Chen, Zhengxin
Author_Institution
Coll. of Inf. Sci. & Technol., Nebraska Univ., Omaha, NE, USA
fYear
2005
fDate
8-11 Aug. 2005
Firstpage
91
Lastpage
92
Abstract
Similarity and cluster analysis are important aspects for analyzing microarray data. Based on our perspective of viewing microarrays as time series data, both similarity analysis and cluster analysis are carried out through indexing on time series data using R*-Trees. We have developed algorithms for similarity and cluster analysis on microarray data, and conducted experimental studies and comparative studies. First, our study shows that principle components analysis (PCA) has superiority over several other methods (such as DFT and PAA) as far as distance conservation is concerned. A similarity analysis tool based on PCA has been developed, which is able to explore less R*-Tree nodes before finding its similar counterparts and returns less false positives than other methods. In addition, we also extend R*-Tree´s application to cluster analysis. With the aid of R*-Tree indexing, two clustering algorithms. KMeans-R and Hierarchy-R, are proposed as an improved version of K-Means and hierarchical clustering, respectively. Experiments for similarity search and cluster analysis based on proposed algorithms have been carried out and have shown favorable results. Experiments related to yeast cell cycle dataset are reported in this paper.
Keywords
arrays; biology computing; cellular biophysics; genetics; pattern clustering; principal component analysis; Hierarchy-R; K-Means; KMeans-R; PCA; R* tree; cluster analysis algorithm; clustering algorithm; distance conservation; hierarchical clustering; microarray data; principle components analysis; similarity analysis; time series data; yeast cell cycle dataset; Algorithm design and analysis; Clustering algorithms; Clustering methods; Data analysis; Fungi; Indexing; Information analysis; Information science; Principal component analysis; Time series analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE
Print_ISBN
0-7695-2442-7
Type
conf
DOI
10.1109/CSBW.2005.125
Filename
1540553
Link To Document