Title :
A method for evaluating the quality of string dissimilarity measures and clustering algorithms for EST clustering
Author :
Zimmermann, Judith ; Lipták, Zsuzsanna ; Hazelhurst, Scott
Author_Institution :
Res. Group ´´Algorithms, Data Structures, & Applications´´, Inst. of Theor. Comput. Sci., Zurich, Switzerland
Abstract :
We present a method for evaluating the suitability of different string dissimilarity measures and clustering algorithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises generating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering algorithms are used. We implemented two tools to do this: ESTSim (EST simulator), which generates simulated EST sequences from mRNAs/cDNAs using user-specified parameters, and ECLEST (evaluator for clusterings of ESTs), which computes and evaluates a clustering of a set of input ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be specified independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statistically significant results from this study comparing subword-based dissimilarity measures to alignment-based ones.
Keywords :
biology computing; genetics; pattern clustering; sequences; EST clustering; clustering algorithms; expressed sequence tags; string dissimilarity; transcriptome; Africa; Bioinformatics; Clustering algorithms; Computational modeling; Computer science; DNA; Data structures; Genomics; Pollution measurement; Sequences;
Conference_Titel :
Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on
Print_ISBN :
0-7695-2173-8
DOI :
10.1109/BIBE.2004.1317357