• DocumentCode
    3588698
  • Title

    MapReduce based parallel suffix tree construction for human genome

  • Author

    Satish, Umesh Chandra ; Kondikoppa, Praveenkumar ; Seung-Jong Park ; Patil, Manish ; Shah, Rahul

  • Author_Institution
    Sch. of Electr. Eng. & Comput. Sci., Louisiana State Univ., Baton Rouge, LA, USA
  • fYear
    2014
  • Firstpage
    664
  • Lastpage
    670
  • Abstract
    Genome indexing is the basis for many bioinformatics applications. Read mapping(sequence alignment) is one such application where the goal is to align millions of short reads against reference genome. Several tools are available for read mapping which rely on different indexing techniques to expedite the alignment process. However, many of these contemporary alignment programs are sequential, memory intensive and cannot be easily scaled for larger genomes. Suffix tree is one of the most widely used data structures for indexing strings (genomes). Building a scalable suffix-tree based tool is particularly challenging due to the difficulties involved in parallel construction of the suffix tree. Several suffix tree construction techniques have been proposed till date with focus on space-time tradeoff. Most of these existing works address the construction issue for uniprocessor and cannot be easily extended to utilize modern multi-processor systems. In this paper we investigate and propose a MapReduce based parallel construction of suffix tree. We demonstrate the performance of the algorithm over commodity cluster using up to 32 nodes each having 8GB of primary memory.
  • Keywords
    bioinformatics; data structures; genomics; indexing; parallel processing; trees (mathematics); MapReduce based parallel suffix tree construction; bioinformatics application; commodity cluster; data structure; genome indexing technique; human genome; multiprocessor system; read mapping application; sequence alignment application; space-time tradeoff; uniprocessor system; Bioinformatics; Buildings; Data structures; Genomics; Indexing; Partitioning algorithms; Random access memory; genome; indexing; map-reduce; parallel; suffix tree;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/PADSW.2014.7097867
  • Filename
    7097867