• DocumentCode
    2886484
  • Title

    Clustering Tandem Repeats via Trinucleotides

  • Author

    Yupu Liang ; Sokol, D. ; Zelikovitz, Sarah

  • Author_Institution
    Dept. of Comput. Sci., City Univ. of New York, New York, NY, USA
  • fYear
    2012
  • fDate
    10-10 Dec. 2012
  • Firstpage
    64
  • Lastpage
    71
  • Abstract
    Tandem repeats in DNA sequences are extremely relevant in biological phenomena and diagnostic tools. Computational programs that discover these tandem repeats generate a huge volume of data, which is often difficult to decipher without further organization. In this paper, we describe a new method for post-processing tandem repeats through clustering. Our work presents multiple ways of expressing tandem repeats using the n-gram model with different clustering distance measures. Analysis of these clusters for chromosome 1 of the human genomes shows that the clustering of tandem repeats according to 3-grams yields well-defined clusters. Our new, alignment-free method facilitates the analysis of the myriad of tandem repeats that occur in the human genome and we believe that this work will lead to new discoveries on the roles, origins, and significance of tandem repeats.
  • Keywords
    DNA; biology computing; pattern clustering; DNA sequences; alignment-free method; biological phenomena; clustering distance measures; computational programs; diagnostic tools; n-gram model; post-processing tandem repeats; tandem repeats clustering; trinucleotides; Algorithm design and analysis; Biological cells; Clustering algorithms; DNA; Genomics; Humans; classification; clustering; human genome; n-grams; tandem repeats;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • Print_ISBN
    978-1-4673-5164-5
  • Type

    conf

  • DOI
    10.1109/ICDMW.2012.57
  • Filename
    6406424