• DocumentCode
    2742699
  • Title

    A Time and Space Efficient Algorithm for Alignment and Visualization of Sparse Retro-elements over Whole Genome Scale

  • Author

    Park, Sun-Young ; Kim, SeonYeong ; Ryu, Dong-Sung ; Cho, Hwan-Gue

  • Author_Institution
    Dept of Comput. Eng., Pusan Nat. Univ., Pusan, South Korea
  • fYear
    2011
  • fDate
    25-27 Jan. 2011
  • Firstpage
    121
  • Lastpage
    126
  • Abstract
    It is not easy to align DNA sequences of whole genomes to find some useful biomarkers (in general term), as whole genome has more than 10 mega bases. Thus, we do not apply the straightforward Smith-Waterman O(N2) time and space algorithms. If some specific DNA segments in which we are interested are very sparse, then the O(n2) dynamic programming algorithm for aligning those are inefficient since that algorithm depends on the total length of the input string rather than the number of markers appearing on it. When we consider the retro-element over the whole genome, this problem becomes very clear since the size of whole genome is much more than a few (about 10-20) retro-elements we want to compare. In this paper we propose another alignment problem that consists of only two kinds of symbols, retro-elements and other general symbols. Thus, the whole genome can be regarded as a binary string of ´1´ (retroelement) and ´0´ (others). We propose an alignment algorithm for simplified binary string to reveal the structural similarity of retro-elements over two different genomes. The time complexity of this algorithm is O(M2), where M denotes the number of retro-elements appearing in the genomes. We studied structural similarities of all HERV(Human Endogenous RetroViruses) element of four typical primates including Human, Chimpanzees, Orangutan and Rhesus monkey to show the usefulness of our algorithm. Our system successfully revealed the similarity distribution of retro elements for the four primates.
  • Keywords
    DNA; biology computing; cellular biophysics; computational complexity; dynamic programming; genetics; genomics; macromolecules; microorganisms; DNA sequence; HERV; Rhesus monkey; Smith-Waterman space efficient algorithm; Smith-Waterman time efficient algorithm; binary string; biomarkers; chimpanzee; dynamic programming algorithm; human endogenous retrovirus; orangutan; similarity distribution; sparse retroelement; time complexity; whole genomes; Bioinformatics; Biological cells; DNA; Dynamic programming; Genomics; Heuristic algorithms; Humans; retro-elements; sequence alignments; whole genome;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems, Modelling and Simulation (ISMS), 2011 Second International Conference on
  • Conference_Location
    Kuala Lumpur
  • Print_ISBN
    978-1-4244-9809-3
  • Type

    conf

  • DOI
    10.1109/ISMS.2011.29
  • Filename
    5730332