DocumentCode :
2719724
Title :
Quality-based distance measures and applications to clustering
Author :
Taverna, Darin M. ; Brun, Marcel ; Dougherty, Edward R. ; Chen, Yidong
Author_Institution :
Translational Genomics Res. Inst., Phoenix, AZ
fYear :
2006
fDate :
38899
Firstpage :
1
Lastpage :
2
Abstract :
When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice of clustering algorithm. Furthermore, the biological relevance of the clustering results is biased by the variation among the data points themselves. We introduce a mathematical quality-based distance metric which will allow all data, regardless of its error, to be included in analysis without the need to introduce a cutoff. This removes the need to exclude points or to change the dimensionality. The advantage of this approach is shown by clustering simulated data with added noise
Keywords :
cellular biophysics; diseases; genetics; medical computing; molecular biophysics; noise; statistical analysis; biological data sets; clustering algorithm; co-regulated gene expression; disease phenotypes; genetic variation; noise; quality-based distance measures; Analytical models; Bioinformatics; Clustering algorithms; Computational modeling; Data analysis; Diseases; Euclidean distance; Gene expression; Genetics; Genomics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Life Science Systems and Applications Workshop, 2006. IEEE/NLM
Conference_Location :
Bethesda, MD
Print_ISBN :
1-4244-0277-8
Electronic_ISBN :
1-4244-0278-6
Type :
conf
DOI :
10.1109/LSSA.2006.250390
Filename :
4015791
Link To Document :
بازگشت