DocumentCode :
3497217
Title :
Fast algorithms for finding patterns in indeterminate and Arc-Annotated sequences
Author :
Aumi, Md Tanvir Islam ; Moosa, Tanaeem M. ; Rahman, M. Sohel
Author_Institution :
Dept. of Comput. Sci. & Eng., Bangladesh Univ. of Eng. & Technol., Dhaka, Bangladesh
fYear :
2011
fDate :
22-24 Dec. 2011
Firstpage :
71
Lastpage :
76
Abstract :
In this paper, we present efficient algorithms for finding indeterminate Arc-Annotated patterns in indeterminate Arc-Annotated references. Our algorithms run in O(m + mn/w) time where n and m are respectively the length of our reference and pattern strings and w is the size of our target machine word size. Here we have assumed the alphabet size to be constant, because, indeterminate Arc-Annotated sequences are used to model biological sequences. Clearly, for short patterns, our algorithms run in linear time and efficient algorithms for matching short patterns to reference genomes have huge applications in practical settings. We also perform some preliminary experiments that suggest that our algorithms run very fast in practice.
Keywords :
biology computing; computational complexity; genomics; string matching; alphabet size; arc-annotated sequence; biological sequence model; indeterminate arc-annotated pattern; indeterminate arc-annotated reference; linear time algorithm; pattern matching; pattern string; reference genomes; target machine word size; DNA; Arc-Annotated sequence; Degenerate sequence; Indeterminate sequence; Short pattern; String matching;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Technology (ICCIT), 2011 14th International Conference on
Conference_Location :
Dhaka
Print_ISBN :
978-1-61284-907-2
Type :
conf
DOI :
10.1109/ICCITechn.2011.6164876
Filename :
6164876
Link To Document :
بازگشت