• DocumentCode
    652274
  • Title

    Genome Assembly on a Multicore System

  • Author

    Biswas, Arijit ; Ranjan, Desh ; Zubair, Mohammad

  • Author_Institution
    Dept. of Comput. Sci., Old Dominion Univ., Norfolk, VA, USA
  • fYear
    2013
  • fDate
    16-18 July 2013
  • Firstpage
    1233
  • Lastpage
    1240
  • Abstract
    The genome assembly problem is to generate the original DNA sequence of an organism from a large set of short (400bp-500bp) overlapping fragments. The assembly problem is challenging particularly in presence of repeats, which are multiple identical or nearly identical stretches of DNA. MIRA is an open source assembler, which is widely used by biologist and works effectively in presence of repeats. However, it is computation intensive, for example an assembly of one million fragments requires about 18.3 hours. The computation in MIRA assembler is dominated by the contigs building phase, which is highly sequential in nature. In this paper, we propose a modification to MIRA assembler that allows this computation to be parallelized while maintaining the quality of the assembly. We implemented the modified MIRA assembler on a 64-core system with eight Intel(R) Xeon(R) X7560 processors. We were able to speedup the building contigs phase by a factor of 55 on the 64-core system. Additionally, we parallelized the other phases of the MIRA assembler and were able to reduce the total sequential execution time of assembly from 18.3 hours to 3.4 hours (speedup of 5.57) without sacrificing assembly quality. It is worth noting that the overall speedup is limited by Amdahl´s Law as parts of original MIRA assembler are inherently sequential. For example for one million reads the sequential portion of the MIRA assembler takes about 2.78 hours doing I/O or other operations which limits the overall speedup to 6.58.
  • Keywords
    DNA; biology computing; genomics; multiprocessing systems; parallel processing; public domain software; Amdahl´s law; I/O operations; Intel(R) Xeon(R) X7560 processors; MIRA open source assembler; assembly quality; contig building phase; genome assembly problem; multicore system; organism DNA sequence generate; parallelized computation; total sequential execution time reduction; Assembly; Bioinformatics; Buildings; Computational modeling; Error correction; Genomics; Image edge detection; Multicore parallelism; OLC Graph Model; OpenMP; Parallel Genome Assembly;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Trust, Security and Privacy in Computing and Communications (TrustCom), 2013 12th IEEE International Conference on
  • Conference_Location
    Melbourne, VIC
  • Type

    conf

  • DOI
    10.1109/TrustCom.2013.148
  • Filename
    6680969