• DocumentCode
    2379456
  • Title

    A distributed system for fast alignment of next-generation sequencing data

  • Author

    Srimani, Jaydeep K. ; Wu, Po-Yen ; Phan, John H. ; Wang, May D.

  • Author_Institution
    Dept. of Electr. & Comput. Eng., Georgia Inst. of Technol., Atlanta, GA, USA
  • fYear
    2010
  • fDate
    18-18 Dec. 2010
  • Firstpage
    579
  • Lastpage
    584
  • Abstract
    We developed a scalable distributed computing system using the Berkeley Open Interface for Network Computing (BOINC) to align next-generation sequencing (NGS) data quickly and accurately. NGS technology is emerging as a promising platform for gene expression analysis due to its high sensitivity compared to traditional genomic microarray technology. However, despite the benefits, NGS datasets can be prohibitively large, requiring significant computing resources to obtain sequence alignment results. Moreover, as the data and alignment algorithms become more prevalent, it will become necessary to examine the effect of the multitude of alignment parameters on various NGS systems. We validate the distributed software system by (1) computing simple timing results to show the speed-up gained by using multiple computers, (2) optimizing alignment parameters using simulated NGS data, and (3) computing NGS expression levels for a single biological sample using optimal parameters and comparing these expression levels to that of a microarray sample. Results indicate that the distributed alignment system achieves approximately a linear speed-up and correctly distributes sequence data to and gathers alignment results from multiple compute clients.
  • Keywords
    bioinformatics; data analysis; distributed algorithms; genetics; genomics; grid computing; optimisation; Berkeley Open Interface for Network Computing; data alignment algorithms; data gathering; datasets; distributed software system; gene expression analysis; next-generation data sequencing; optimization; scalable distributed computing system; single biological sample; BOINC; distributed computing; gene expression analysis; next-generation sequencing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2010 IEEE International Conference on
  • Conference_Location
    Hong, Kong
  • Print_ISBN
    978-1-4244-8303-7
  • Electronic_ISBN
    978-1-4244-8304-4
  • Type

    conf

  • DOI
    10.1109/BIBMW.2010.5703865
  • Filename
    5703865