Title :
Reference sequence selection for motif searches
Author :
Qiang Yu;Hongwei Huo; Ruixing Zhao; Dazheng Feng;Jeffrey Scott Vitter; Jun Huan
Author_Institution :
School of Computer Science and Technology, Xidian University, Xi´an, 710071, China
Abstract :
The planted (l, d) motif search (PMS) is an important yet challenging problem in computational biology. Pattern-driven PMS algorithms usually use k out of t input sequences as reference sequences to generate candidate motifs, and they can find all the (l, d) motifs in the input sequences. However, most of them simply take the first k sequences in the input as reference sequences without elaborate selection processes, and thus they may exhibit sharp fluctuations in running time, especially for large alphabets. In this paper, we build the reference sequence selection problem and propose a method named RefSelect to quickly solve it by evaluating the number of candidate motifs for the reference sequences. RefSelect can bring a practical time improvement of the state-of-the-art pattern-driven PMS algorithms. Experimental results show that RefSelect (1) makes the tested algorithms solve the PMS problem steadily in an efficient way, (2) particularly, makes them achieve a speedup of up to about 100× on the protein data, and (3) is also suitable for large data sets which contain hundreds or more sequences.
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2015 IEEE International Conference on
DOI :
10.1109/BIBM.2015.7359745