DocumentCode :
1988249
Title :
IEEE 7th BIBE Research Tutorial Lecture: Decoding Novel Genomes: From Microbiomes to the Eukaryota
Author :
Borodovsky, Mark
Author_Institution :
Georgia Inst. of Technol., Atlanta
fYear :
2007
fDate :
14-17 Oct. 2007
Firstpage :
3
Lastpage :
3
Abstract :
One of the main goals of computational genomics is fast and accurate biological interpretation of newly sequenced genomic DNA. The complexity of the task varies among genomes but is never simple. Currently, for a new genome a custom built annotation pipeline is constructed by integration of ab initio and comparative genomic methods. Still, a consistent solution of the jigsaw puzzle of genome annotation frequently requires additional experimental efforts (such as EST/cDNA sequencing, etc.) Current ab initio gene finding algorithms use statistical analysis and optimization to solve the gene identification problem restated as search for the optimal parse of the genomic sequence into fragments with distinct statistical characteristics. This problem setting leads to a classic task for dynamic programming: search for an optimal path through a network with weights/scores assigned to nodes and vertices. Obviously, assignment of weights/scores plays a critical role and may present a significant challenge. This task is equivalent to estimation of parameters of statistical models (hidden Markov models) representing a mosaic of functional sequences and sites in a given genome. The task is rather easy when large sets of validated training sequences are available. However, it is not the case for hundreds of currently unfolding genome sequencing and annotation projects. In the lecture we will consider the general schemes of ab initio gene prediction. We will discuss estimation of model parameters without a training set. We will show that this unsupervised approach is possible and is becoming very important for two rapidly developing branches of genomics: i/ for prokaryotic metagenomes that are becoming a rich source of information about non-cultivated microbial species and ii/ for "compact" eukaryotic genomes, such as fungi, which relatively short genome size (less than 50 MB) allows to obtain complete genome sequence in a relatively short time.
Keywords :
DNA; ab initio calculations; biology computing; cellular biophysics; dynamic programming; genetics; hidden Markov models; molecular biophysics; molecular configurations; statistical analysis; DNA; ab initio gene finding algorithms; computational genomics; dynamic programming; eukaryota; genome sequencing; genomes; hidden Markov models; microbiomes; optimization; statistical analysis; Bioinformatics; Biology computing; DNA computing; Dynamic programming; Genomics; Hidden Markov models; Parameter estimation; Pipelines; Sequences; Statistical analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Bioengineering, 2007. BIBE 2007. Proceedings of the 7th IEEE International Conference on
Conference_Location :
Boston, MA
Print_ISBN :
978-1-4244-1509-0
Type :
conf
DOI :
10.1109/BIBE.2007.4375531
Filename :
4375531
Link To Document :
بازگشت