DocumentCode :
3034065
Title :
Sequence transformation to a complex signature form for consistent phylogenetic tree using Extensible Markov Model
Author :
Kotamarti, Rao M. ; Hahsler, Michael ; Raiford, Douglas W. ; Dunham, Margaret H.
Author_Institution :
Dept. of Comput. Sci. & Eng., Southern Methodist Univ., Dallas, TX, USA
fYear :
2010
fDate :
2-5 May 2010
Firstpage :
1
Lastpage :
8
Abstract :
Phylogenetic tree analysis using molecular sequences continues to expand beyond the 16S rRNA marker. By addressing the multi-copy issue known as the intra-heterogeneity, this paper restores the focus in using the 16S rRNA marker. Through use of a novel learning and model building algorithm, the multiple gene copies are integrated into a compact complex signature using the Extensible Markov Model (EMM). The method clusters related sequence segments while preserving their inherent order to create an EMM signature for a microbial organism. A library of EMM signatures is generated from which samples are drawn for phylogenetic analysis. By matching the components of two signatures, referred to as quasi-alignment, the differences are highlighted and scored. Scoring quasi-alignments is done using adapted Karlin-Altschul statistics to compute a novel distance metric. The metric satisfies conditions of identity, symmetry, triangular inequality and the four point rule required for a valid evolution distance metric. The resulting distance matrix is input to PHYologeny Inference Package (PHYLIP) to generate phylogenies using neighbor joining algorithms. Through control of clustering in signature creation, the diversity of similar organisms and their placement in the phylogeny is explained. The experiments include analysis of genus Burkholderia, a random microbial sample spanning several phyla and a diverse sample that includes RNA of Eukaryotic origin. The NCBI sequence data for 16S rRNA is used for validation.
Keywords :
Markov processes; bioinformatics; directed graphs; learning (artificial intelligence); molecular biophysics; 16S rRNA marker; EMM signature; Karlin-Altschul statistics; PHYLIP; PHYologeny Inference Package; Phylogenetic tree analysis; consistent phylogenetic tree; distance matrix; extensible Markov model; genus Burkholderia; microbial organism; model building algorithm; molecular sequences; multiple gene copies; neighbor joining algorithms; sequence transformation; Bioinformatics; Clustering algorithms; Genomics; Libraries; Machine learning; Organisms; Phylogeny; RNA; Statistics; Taxonomy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), 2010 IEEE Symposium on
Conference_Location :
Montreal, QC
Print_ISBN :
978-1-4244-6766-2
Type :
conf
DOI :
10.1109/CIBCB.2010.5510472
Filename :
5510472
Link To Document :
بازگشت