DocumentCode :
2527367
Title :
Similarity and cluster analysis algorithms for microarrays using R* trees
Author :
Pi, Jiaxiong ; Shi, Yong ; Chen, Zhengxin
Author_Institution :
Coll. of Inf. Sci. & Technol., Nebraska Univ., Omaha, NE, USA
fYear :
2005
fDate :
8-11 Aug. 2005
Firstpage :
91
Lastpage :
92
Abstract :
Similarity and cluster analysis are important aspects for analyzing microarray data. Based on our perspective of viewing microarrays as time series data, both similarity analysis and cluster analysis are carried out through indexing on time series data using R*-Trees. We have developed algorithms for similarity and cluster analysis on microarray data, and conducted experimental studies and comparative studies. First, our study shows that principle components analysis (PCA) has superiority over several other methods (such as DFT and PAA) as far as distance conservation is concerned. A similarity analysis tool based on PCA has been developed, which is able to explore less R*-Tree nodes before finding its similar counterparts and returns less false positives than other methods. In addition, we also extend R*-Tree´s application to cluster analysis. With the aid of R*-Tree indexing, two clustering algorithms. KMeans-R and Hierarchy-R, are proposed as an improved version of K-Means and hierarchical clustering, respectively. Experiments for similarity search and cluster analysis based on proposed algorithms have been carried out and have shown favorable results. Experiments related to yeast cell cycle dataset are reported in this paper.
Keywords :
arrays; biology computing; cellular biophysics; genetics; pattern clustering; principal component analysis; Hierarchy-R; K-Means; KMeans-R; PCA; R* tree; cluster analysis algorithm; clustering algorithm; distance conservation; hierarchical clustering; microarray data; principle components analysis; similarity analysis; time series data; yeast cell cycle dataset; Algorithm design and analysis; Clustering algorithms; Clustering methods; Data analysis; Fungi; Indexing; Information analysis; Information science; Principal component analysis; Time series analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Systems Bioinformatics Conference, 2005. Workshops and Poster Abstracts. IEEE
Print_ISBN :
0-7695-2442-7
Type :
conf
DOI :
10.1109/CSBW.2005.125
Filename :
1540553
Link To Document :
بازگشت