• DocumentCode
    588171
  • Title

    MARISSA: MApReduce Implementation for Streaming Science Applications

  • Author

    Dede, E. ; Fadika, Z. ; Hartog, J. ; Govindaraju, M. ; Ramakrishnan, Lavanya ; Gunter, Dan ; Canon, Richard

  • Author_Institution
    SUNY Binghamton, Binghamton, NY, USA
  • fYear
    2012
  • fDate
    8-12 Oct. 2012
  • Firstpage
    1
  • Lastpage
    8
  • Abstract
    MapReduce has since its inception been steadily gaining ground in various scientific disciplines ranging from space exploration to protein folding. The model poses a challenge for a wide range of current and legacy scientific applications for addressing their “Big Data” challenges. For example: MapRe-duce´s best known implementation, Apache Hadoop, only offers native support for Java applications. While Hadoop streaming supports applications compiled in a variety of languages such as C, C++, Python and FORTRAN, streaming has shown to be a less efficient MapReduce alternative in terms of performance, and effectiveness. Additionally, Hadoop streaming offers lesser options than its native counterpart, and as such offers less flexibility along with a limited array of features for scientific software. The Hadoop File System (HDFS), a central pillar of Apache Hadoop is not a POSIX compliant file system. In this paper, we present an alternative framework to Hadoop streaming to address the needs of scientific applications: MARISSA (MApReduce Implementation for Streaming Science Applications). We describe MARISSA´s design and explain how it expands the scientific applications that can benefit from the MapReduce model. We also compare and explain the performance gains of MARISSA over Hadoop streaming.
  • Keywords
    C++ language; Java; distributed processing; Apache Hadoop; C languages; C++ languages; FORTRAN languages; HDFS; Hadoop file system; Hadoop streaming; Java applications; MARISSA; POSIX compliant file system; big data; mapreduce implementation for streaming science applications; protein folding; space exploration; Arrays; Data models; Fault tolerance; Fault tolerant systems; File systems; Java; Peer to peer computing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    E-Science (e-Science), 2012 IEEE 8th International Conference on
  • Conference_Location
    Chicago, IL
  • Print_ISBN
    978-1-4673-4467-8
  • Type

    conf

  • DOI
    10.1109/eScience.2012.6404432
  • Filename
    6404432