DocumentCode :
1199688
Title :
Finding Significant Matches of Position Weight Matrices in Linear Time
Author :
Pizzi, Cinzia ; Rastas, Pasi ; Ukkonen, Esko
Author_Institution :
Dept. of Inf. Eng., Univ. degli Studi di Padova, Padova, Italy
Volume :
8
Issue :
1
fYear :
2011
Firstpage :
69
Lastpage :
79
Abstract :
Position weight matrices are an important method for modeling signals or motifs in biological sequences, both in DNA and protein contexts. In this paper, we present fast algorithms for the problem of finding significant matches of such matrices. Our algorithms are of the online type, and they generalize classical multipattern matching, filtering, and superalphabet techniques of combinatorial string matching to the problem of weight matrix matching. Several variants of the algorithms are developed, including multiple matrix extensions that perform the search for several matrices in one scan through the sequence database. Experimental performance evaluation is provided to compare the new techniques against each other as well as against some other online and index-based algorithms proposed in the literature. Compared to the brute-force O(mn) approach, our solutions can be faster by a factor that is proportional to the matrix length m. Our multiple-matrix filtration algorithm had the best performance in the experiments. On a current PC, this algorithm finds significant matches (p = 0.0001) of the 123 JASPAR matrices in the human genome in about 18 minutes.
Keywords :
DNA; bioinformatics; molecular biophysics; proteins; DNA; biological sequence; brute force approach; linear time; position weight matrices; protein; significant match; superalphabet technique; Biological system modeling; Context modeling; DNA; Databases; Filtering algorithms; Filtration; Humans; Matched filters; Proteins; Sequences; Position weight matrices; pattern search; position-specific scoring matrices; profiles; string matching.; Algorithms; Computational Biology; DNA; Humans; Pattern Recognition, Automated; Proteins; Sequence Analysis, DNA; Sequence Analysis, Protein;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2009.35
Filename :
4803829
Link To Document :
بازگشت