DocumentCode :
599147
Title :
A novel quasi-alignment-based method for discovering conserved regions in genetic sequences
Author :
Nagar, Atulya ; Hahsler, M.
Author_Institution :
Comput. Sci. & Eng, Southern Methodist Univ., Dallas, TX, USA
fYear :
2012
fDate :
4-7 Oct. 2012
Firstpage :
662
Lastpage :
669
Abstract :
This paper presents an alignment-free technique to efficiently discover similar regions in large sets of biological sequences using position sensitive p-mer frequency clustering. A set of sequences is broken down into segment and then a frequency distribution over all oligomers of size p (referred to as p-mers) is obtained to summarize each segment. These summaries are clustered while the order of segments in the set of sequences is preserved in a Markov-type model. Sequence segments within each cluster have very similar DNA/RNA patterns and form a so called quasi-alignment. This fact can be used for a variety of tasks such as species characterization and identification, phylogenetic analysis, functional analysis of sequences and, as in this paper, for discovering conserved regions. Our method is computationally more efficient than multiple sequences alignment since it can apply modern data stream clustering algorithms which run in time linear in the number of segments and thus can help discover highly similar regions across a large number of sequences efficiently. In this paper, we apply the approach to efficiently discover and visualize conserved regions in 16S rRNA.
Keywords :
DNA; RNA; biology computing; evolution (biological); genetics; molecular biophysics; molecular configurations; 16S rRNA; DNA-RNA patterns; Markov-type model; alignment-free technique; biological sequences; functional analysis; genetic sequences; multiple sequences alignment; phylogenetic analysis; position sensitive p-mer frequency clustering; quasialignment-based method; sequence segments; stream clustering algorithms; Bioinformatics; Buildings; Databases; Genomics; Numerical models; Phylogeny; Visualization; ONA/RNA sequences; conserved sequences; multiple sequence alignment; quasi-alignment;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
978-1-4673-2746-6
Electronic_ISBN :
978-1-4673-2744-2
Type :
conf
DOI :
10.1109/BIBMW.2012.6470216
Filename :
6470216
Link To Document :
بازگشت