DocumentCode :
2240696
Title :
FLASH: a fast look-up algorithm for string homology
Author :
Califano, Andrea ; Rigoutsos, Isidore
Author_Institution :
IBM T.J. Watson Res. Center, Yorktown Heights, NY, USA
fYear :
1993
fDate :
15-17 Jun 1993
Firstpage :
353
Lastpage :
359
Abstract :
A key issue in managing large amounts of data is the availability of efficient, accurate, ad selective techniques to detect homology (similarity) between newly recovered and previously acquired sequences. The algorithm presented is based on a probabilistic indexing framework which requires minimal access to the database for each match. A highly redundant number of descriptive tuples from the sequences of interest are generated and used as indices in a table look-up paradigm. Theoretical and experimental results on the sensitivity and accuracy of the approach are provided. These include the probability of correct and random matches and the storage and computational requirements. An experimental system is implemented for a database containing the complete genome of the bacteria E. Coli (approximately 2 million nucleotides). Search time is a few seconds on a workstation class machine. The algorithm is shown to scale well to databases containing billions of nucleotides with performances that are orders of magnitude better than the fastest of the current techniques
Keywords :
biology computing; cellular biophysics; database management systems; macromolecules; probability; table lookup; E. Coli; FLASH; bacteria; database; fast look-up algorithm for string homology; genome; nucleotides; probabilistic indexing framework; redundancy; sequence similarity; table look-up; Algorithm design and analysis; Bioinformatics; Databases; Genomics; Image databases; Indexing; Microorganisms; Rodents; Table lookup; Testing; Time sharing computer systems; Workstations;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Vision and Pattern Recognition, 1993. Proceedings CVPR '93., 1993 IEEE Computer Society Conference on
Conference_Location :
New York, NY
ISSN :
1063-6919
Print_ISBN :
0-8186-3880-X
Type :
conf
DOI :
10.1109/CVPR.1993.341106
Filename :
341106
Link To Document :
بازگشت