DocumentCode :
2790316
Title :
Fast searching in biological sequences using multiple hash functions
Author :
Faro, S. ; Lecroq, Thierry
Author_Institution :
Dip. di Mat. e Inf., Univ. di Catania, Catania, Italy
fYear :
2012
fDate :
11-13 Nov. 2012
Firstpage :
175
Lastpage :
180
Abstract :
With the availability of large amounts of DNA data, exact matching of nucleotide sequences has become an important application in modern computational biology and in meta-genomics. In this paper we present an efficient method based on multiple hashing functions which improves the performance of existing string matching algorithms when used for searching DNA sequences. From our experimental results it turns out that the new proposed technique leads to algorithms which are up to 8 times faster than the best algorithm known for matching multiple patterns. It turns out also that the gain in performances is larger when searching for larger sets. Thus, considering the fact that the number of reads produced by next generation sequencing equipments is ever growing, the new technique serves a good basis for massive multiple long pattern search applications.
Keywords :
DNA; biology computing; file organisation; genomics; molecular biophysics; string matching; text analysis; DNA data; DNA sequence searching; biological sequences; computational biology; exact nucleotide sequence matching; fast searching; meta-genomics; molecular biology; multiple hashing functions; nucleotide sequences; performance improvement; sequencing equipments; string matching algorithms; text processing; Bioinformatics; Computers; DNA; Genomics; Pattern matching; Standards; DNA searching; biological sequences; hashing algorithms; string matching; text processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics & Bioengineering (BIBE), 2012 IEEE 12th International Conference on
Conference_Location :
Larnaca
Print_ISBN :
978-1-4673-4357-2
Type :
conf
DOI :
10.1109/BIBE.2012.6399669
Filename :
6399669
Link To Document :
بازگشت