Title :
ACache: Using Caching to Improve the Performance of Multiple Sequence Alignments
Author :
Tu, Xun ; Claypool, Kajal T. ; Chen, Cindy X.
Author_Institution :
Dept. of Comput. Sci., Massachusetts Univ., Lowell , MA
Abstract :
Multiple sequence alignment represents a class of powerful bioinformatics tools with many uses in computational biology ranging from discovery of characteristic motifs and conserved regions in protein families to improved prediction of secondary and tertiary structure. Today, with rapidly growing data repositories offering scientists significantly more data with which to make better decisions, it is increasingly important to run these multiple alignment calculations as rapidly as possible. However, while several multiple alignment algorithms have been developed, these algorithms remain computationally expensive taking as long as 2 to 3 days for some queries. In this paper, we propose a new caching technique to improve the performance of multiple sequence alignment algorithms. In particular, we propose a nested two level cache hierarchy that provides caching of pairwise alignment results - a computationally expensive subcomponent of the multiple sequence alignment algorithms. A key contribution of our work is the development of two novel cache replacement policies that closely track the scientist´s query patterns over time. We present experimental results that validate the benefits of caching over the repeated computation of the alignments, provide heuristics for determining which alignments would benefit from the caching, and show the effectiveness of the developed cache replacement policies
Keywords :
biology computing; cache storage; data warehouses; molecular biophysics; proteins; query processing; sequences; tree data structures; ACache; bioinformatics tool; cache replacement policy; computational biology; data caching; data repository; multiple sequence alignment algorithm; nested two level cache hierarchy; protein family; scientist query pattern; Bioinformatics; Biology computing; Computational biology; Computer science; Databases; Dynamic programming; Genomics; Heuristic algorithms; Protein engineering; Sequences;
Conference_Titel :
Scientific and Statistical Database Management, 2006. 18th International Conference on
Conference_Location :
Vienna
Print_ISBN :
0-7695-2590-3
DOI :
10.1109/SSDBM.2006.9