• DocumentCode
    154096
  • Title

    Reducing MapReduce Abstraction Costs for Text-centric Applications

  • Author

    Chun-Hung Hsiao ; Cafarella, Michael ; Narayanasamy, Satish

  • Author_Institution
    Univ. of Michigan, Ann Arbor, MI, USA
  • fYear
    2014
  • fDate
    9-12 Sept. 2014
  • Firstpage
    40
  • Lastpage
    49
  • Abstract
    The MapReduce framework has become widely popular for programming large clusters, even though MapReduce jobs may use underlying resources relatively inefficiently. There has been substantial research in improving MapReduce performance for applications that were inspired by relational database queries, but almost none for text-centric applications, including inverted index construction, processing large log files, and so on. We identify two simple optimizations to improve MapReduce performance on text-centric tasks: frequency-buffering and spill-matcher. The former approach improves buffer efficiency for intermediate map outputs by identifying frequent keys, effectively shrinking the amount of work that the shuffle phase must perform. Spill-matcher is a runtime controller that improves parallelization of MapReduce framework background tasks. Together, our two optimizations improve the performance of text-centric applications by up to 39.1%. We demonstrate gains on both a small local cluster and Amazon´s EC2 cloud service. Unlike other MapReduce optimizations, these techniques require no user code changes, and only small changes to the MapReduce system.
  • Keywords
    cloud computing; optimisation; parallel programming; relational databases; text analysis; Amazon´s EC2 cloud service; MapReduce abstraction cost reduction; MapReduce framework background task parallelization; MapReduce performance improvement; buffer efficiency; frequency-buffering; frequent keys; runtime controller; shuffle phase; spill-matcher; text-centric applications; text-centric tasks; Indexes; Instruction sets; Optimization; Parallel processing; Runtime; Sorting; Standards;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Processing (ICPP), 2014 43rd International Conference on
  • Conference_Location
    Minneapolis MN
  • ISSN
    0190-3918
  • Type

    conf

  • DOI
    10.1109/ICPP.2014.13
  • Filename
    6957213