DocumentCode :
2341487
Title :
High similarity sequence comparison in clustering large sequence databases
Author :
Dudoignon, Lorie ; Glemet, Eric ; Heus, Hendrik Cornelis ; Raffinot, Mathieu
Author_Institution :
IMT, INRIA, Marseille, France
fYear :
2002
fDate :
2002
Firstpage :
228
Lastpage :
236
Abstract :
We present a fast algorithm for sequence clustering and searching which works with large sequence databases. It uses a strictly defined similarity measure. The algorithm is faster than conventional EST clustering approaches because its complexity is directly related to the number of subwords shared by the sequences. Furthermore, the algorithm also works with proteic sequences and large sequences like entire chromosomes. We present a theoretical study of our approach and provide experimental results.
Keywords :
biology computing; cellular biophysics; computational complexity; genetics; molecular biophysics; pattern clustering; scientific information systems; sequences; very large databases; chromosomes; complexity; fast algorithm; high similarity sequence comparison; large sequence database clustering; proteic sequences; sequence searching; similarity measure; subwords; Bioinformatics; Chromium; Computer Society; Databases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics Conference, 2002. Proceedings. IEEE Computer Society
Print_ISBN :
0-7695-1653-X
Type :
conf
DOI :
10.1109/CSB.2002.1039345
Filename :
1039345
Link To Document :
بازگشت