DocumentCode :
1092814
Title :
Superiority of Spaced Seeds for Homology Search
Author :
Zhang, Louxin
Author_Institution :
Nat. Univ. of Singapore, Singapore
Volume :
4
Issue :
3
fYear :
2007
Firstpage :
496
Lastpage :
505
Abstract :
In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyzing the average number of nonoverlapping hits and the hit probability of a spaced seed in the Bernoulli sequence model. We prove that, when the length of a nonuniformly spaced seed is bounded above by an exponential function of the seed weight, the seed strictly outperforms the traditional consecutive seed of the same weight in both 1) the average number of nonoverlapping hits and 2) the asymptotic hit probability. This clearly answers the first problem mentioned above in the Bernoulli sequence model. The theoretical study in this paper also gives a new solution to finding long optimal seeds.
Keywords :
biology computing; Bernoulli sequence model; bioinformatics; homology search; long optimal seed; seed weight; spaced seeds; Bioinformatics; Costs; DNA; Databases; Filtration; Pattern matching; Probability; Protein sequence; Statistics; Homology search; pattern matching; renewal theory; run statistics; sequence alignment; spaced seeds; Algorithms; Pattern Recognition, Automated; Sequence Alignment; Sequence Analysis; Sequence Homology;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/tcbb.2007.1013
Filename :
4288075
Link To Document :
بازگشت