• DocumentCode
    236536
  • Title

    To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload

  • Author

    Ene, Stefan ; Nicolae, Bogdan ; Costan, Alexandru ; Antoniu, Gabriel

  • Author_Institution
    Univ. Politeh. Bucharest, Bucharest, Romania
  • fYear
    2014
  • fDate
    21-21 Nov. 2014
  • Firstpage
    9
  • Lastpage
    16
  • Abstract
    Research on cloud-based Big Data analytics has focused so far on optimizing the performance and cost-effectiveness of the computations, while largely neglecting an important aspect: users need to upload massive datasets on clouds for their computations. This paper studies the problem of running MapReduce applications when considering the simultaneous optimization of performance and cost of both the data upload and its corresponding computation taken together. We analyze the feasibility of incremental MapReduce approaches to advance the computation as much as possible during the data upload by using already transferred data to calculate intermediate results. Our key finding shows that overlapping the transfer time with as many incremental computations as possible is not always efficient: a better solution is to wait for enough to fill the computational capacity of the MapReduce cluster. Results show significant performance and cost reduction compared with state-of-the-art solutions that leverage incremental computations in a naive fashion.
  • Keywords
    Big Data; data analysis; parallel processing; MapReduce applications; cloud-based big data analytics; computational capacity; incremental MapReduce computation optimization; on-demand data upload; performance optimization; transfer time; Algorithm design and analysis; Cloud computing; Computational modeling; Context; Data models; Data transfer; Throughput; MapReduce; data management; incremental processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data-Intensive Computing in the Clouds (DataCloud), 2014 5th International Workshop on
  • Conference_Location
    New Orleans, LA
  • Type

    conf

  • DOI
    10.1109/DataCloud.2014.7
  • Filename
    7017948