• DocumentCode
    3453979
  • Title

    A modularized MapReduce framework to support RNA secondary structure prediction and analysis workflows

  • Author

    Boyu Zhang ; Yehdego, Daniel T. ; Johnson, Kyle L. ; Ming-Ying Leung ; Taufer, Michela

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Delaware, Newark, DE, USA
  • fYear
    2012
  • fDate
    4-7 Oct. 2012
  • Firstpage
    86
  • Lastpage
    93
  • Abstract
    Ribonucleic acid (RNA) molecules play important roles in many biological processes including gene expression and regulation. Their secondary structures are crucial for the RNA functionality, and the prediction of the secondary structures is widely studied. Previous research shows that cutting long sequences into shorter chunks, predicting secondary structures of the chunks independently using thermodynamic methods, and reconstructing the entire secondary structure from the predicted chunk structures tend to yield better accuracy than predicting the secondary structure using the entire RNA sequence as a whole. The chunking, prediction, and reconstruction processes can use different methods and parameters, some of which produce more accurate predictions than others. The RNA sequence can be cut into chunks using different cutting methods and chunk lengths. Several prediction methods, with different degree of accuracy and computing requirements, can be used. The reconstruction of shorter predictions into the entire sequence can rely on simply gluing the parts together or on using more sophisticated merging algorithms. To allow scientists to perform a systematic analysis of the impact of the several methods and parameters, we propose a modularized framework using MapReduce. The framework enables scientists to automatically and efficiently explore large parametric spaces of chunking, prediction, reconstruction, and analysis methods. This paper shows how the MapReduce framework allows scientists to gain insights about different chunking strategies easily, accurately, and efficiently.
  • Keywords
    RNA; biology computing; distributed processing; RNA secondary structure prediction; analysis workflows; chunk structures; chunking process; gene expression; gene regulation; modularized MapReduce framework; prediction process; reconstruction process; ribonucleic acid molecules; thermodynamic methods; Accuracy; Algorithm design and analysis; Bioinformatics; Educational institutions; Electronic mail; Prediction algorithms; RNA; Hadoop; Parallel programming; bioinformatics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on
  • Conference_Location
    Philadelphia, PA
  • Print_ISBN
    978-1-4673-2746-6
  • Electronic_ISBN
    978-1-4673-2744-2
  • Type

    conf

  • DOI
    10.1109/BIBMW.2012.6470251
  • Filename
    6470251