• DocumentCode
    3230166
  • Title

    A method for evaluating the quality of string dissimilarity measures and clustering algorithms for EST clustering

  • Author

    Zimmermann, Judith ; Lipták, Zsuzsanna ; Hazelhurst, Scott

  • Author_Institution
    Res. Group ´´Algorithms, Data Structures, & Applications´´, Inst. of Theor. Comput. Sci., Zurich, Switzerland
  • fYear
    2004
  • fDate
    19-21 May 2004
  • Firstpage
    301
  • Lastpage
    309
  • Abstract
    We present a method for evaluating the suitability of different string dissimilarity measures and clustering algorithms for EST clustering, one of the main techniques used in transcriptome projects. The method comprises generating simulated ESTs with user-specified parameters, and then evaluating the quality of clusterings produced when different dissimilarity measures and different clustering algorithms are used. We implemented two tools to do this: ESTSim (EST simulator), which generates simulated EST sequences from mRNAs/cDNAs using user-specified parameters, and ECLEST (evaluator for clusterings of ESTs), which computes and evaluates a clustering of a set of input ESTs, where the dissimilarity measure, the clustering algorithm, and the clustering validity index can be specified independently. We demonstrate the method on a sample of 699 cDNAs, generating approximately 16,000 simulated ESTs. We conducted two experiments and derived statistically significant results from this study comparing subword-based dissimilarity measures to alignment-based ones.
  • Keywords
    biology computing; genetics; pattern clustering; sequences; EST clustering; clustering algorithms; expressed sequence tags; string dissimilarity; transcriptome; Africa; Bioinformatics; Clustering algorithms; Computational modeling; Computer science; DNA; Data structures; Genomics; Pollution measurement; Sequences;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Bioengineering, 2004. BIBE 2004. Proceedings. Fourth IEEE Symposium on
  • Print_ISBN
    0-7695-2173-8
  • Type

    conf

  • DOI
    10.1109/BIBE.2004.1317357
  • Filename
    1317357