DocumentCode :
1184526
Title :
Searching in parallel for similar strings [biological sequences]
Author :
Rigoutsos, Isidore ; Califano, Andrea
Author_Institution :
IBM Thomas J. Watson Res. Center, Yorktown Heights, NY, USA
Volume :
1
Issue :
2
fYear :
1994
Firstpage :
60
Lastpage :
75
Abstract :
Distributed computation, probabilistic indexing and hashing techniques combine to create a novel approach to processing very large biological-sequence databases. Other data-intensive tasks could also benefit. Our indexing-based approach enables fast similarity searching through a large database of strings. Thanks to a redundant table-lookup scheme, recovering database items that match a test sequence requires minimal data access. We have implemented a uniprocessor version of this approach called Flash (Fast Lookup Algorithm for String Homology) as well as a distributed version, dFlash, using a cluster of seven non-dedicated workstations connected through a local area network. In this article, we present an approach for retrieving homologies in databases of proteins.<>
Keywords :
biology computing; distributed algorithms; file organisation; indexing; proteins; very large databases; Fast Lookup Algorithm for String Homology; Flash; biological-sequence databases; dFlash; data-intensive tasks; database item recovery; distributed computation; fast similarity searching; hashing techniques; local area network; minimal data access; nondedicated workstation cluster; parallel searching; probabilistic indexing; proteins; redundant table-lookup scheme; similar strings; uniprocessor version; Biology computing; Clustering algorithms; Distributed computing; Distributed databases; Indexing; Information retrieval; Local area networks; Proteins; Testing; Workstations;
fLanguage :
English
Journal_Title :
Computational Science & Engineering, IEEE
Publisher :
ieee
ISSN :
1070-9924
Type :
jour
DOI :
10.1109/99.326666
Filename :
326666
Link To Document :
بازگشت