DocumentCode :
3178060
Title :
BLAST Tree: Fast Filtering for Genomic Sequence Classification
Author :
King, Stuart ; Sun, Yanni ; Cole, James ; Pramanik, Sakti
Author_Institution :
Comput. Sci., Michigan State Univ., East Lansing, MI, USA
fYear :
2010
fDate :
May 31 2010-June 3 2010
Firstpage :
58
Lastpage :
65
Abstract :
With the advent of next-generation sequencing and culture-independent methods, we now are accumulating an enormous amount of metagenomic data from microbial communities. These data sets are large, hard to assemble, and might encode rare or novel proteins, posing new computational challenges for protein homology search. This paper presents a novel protein homology search algorithm that combines the salient features of pairwise sequence alignment programs such as Blast and protein family based tools such as Hmmer. It is optimized for protein annotation in metagenomic data sets because: 1) it is fast, 2) it can classify short protein fragments encoded by individual sequence reads, 3) it can find homologs to novel or rare protein families when there is not enough member sequences to build a probabilistic model. Our algorithm builds a new indexing data structure called BlastTree, which can index a large sequence family database because of our effective compression techniques. In addition, BlastTree fully exploits sequence family membership information to improve homology search sensitivity. When the BlastTree Search algorithm is incorporated into Hmmer, it runs in a fraction of the time with comparable quality.
Keywords :
bioinformatics; data structures; genomics; indexing; pattern classification; probability; tree searching; BlastTree search algorithm; culture independent method; genomic sequence classification; indexing data structure; large sequence family database; metagenomic data sets; pairwise sequence alignment programs; probabilistic model; protein homology search algorithm; salient features; Assembly; Bioinformatics; Classification tree analysis; Data structures; Filtering; Genomics; Hidden Markov models; Indexes; Indexing; Proteins; Bioinformatics; Blast; Hmmer; Homology; Metagenomic; Trie; component;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
BioInformatics and BioEngineering (BIBE), 2010 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4244-7494-3
Type :
conf
DOI :
10.1109/BIBE.2010.74
Filename :
5521711
Link To Document :
بازگشت