Title :
Clustering Tandem Repeats via Trinucleotides
Author :
Yupu Liang ; Sokol, D. ; Zelikovitz, Sarah
Author_Institution :
Dept. of Comput. Sci., City Univ. of New York, New York, NY, USA
Abstract :
Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, we describe a new method for post-processing tandem repeats through clustering. Our work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of these clusters for chromosome 1 of the human genomes shows that the clustering of tandem repeats according to 3-grams yields well-defined clusters. Our new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and we believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.
Keywords :
DNA; biology computing; pattern clustering; DNA sequences; alignment-free method; biological phenomena; clustering distance measures; computational programs; diagnostic tools; n-gram model; post-processing tandem repeats; tandem repeats clustering; trinucleotides; Algorithm design and analysis; Biological cells; Clustering algorithms; DNA; Genomics; Humans; classification; clustering; human genome; n-grams; tandem repeats;
Conference_Titel :
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
Print_ISBN :
978-1-4673-5164-5
DOI :
10.1109/ICDMW.2012.57