• DocumentCode
    3717147
  • Title

    Composable and efficient functional big data processing framework

  • Author

    Dongyao Wu;Sherif Sakr;Liming Zhu;Qinghua Lu

  • Author_Institution
    Software Systems Research Group, NICTA, Sydney, Australia
  • fYear
    2015
  • Firstpage
    279
  • Lastpage
    286
  • Abstract
    Over the past years, frameworks such as MapReduce and Spark have been introduced to ease the task of developing big data programs and applications. However, the jobs in these frameworks are roughly defined and packaged as executable jars without any functionality being exposed or described. This means that deployed jobs are not natively composable and reusable for subsequent development. Besides, it also hampers the ability for applying optimizations on the data flow of job sequences and pipelines. In this paper, we present the Hierarchically Distributed Data Matrix (HDM) which is a functional, strongly-typed data representation for writing composable big data applications. Along with HDM, a runtime framework is provided to support the execution of HDM applications on distributed infrastructures. Based on the functional data dependency graph of HDM, multiple optimizations are applied to improve the performance of executing HDM jobs. The experimental results show that our optimizations can achieve improvements of between 10% to 60% of the Job-Completion-Time for different types of operation sequences when compared with the current state of art, Apache Spark.
  • Keywords
    "Semantics","Optimization","Big data","Distributed databases","Sparks","Programming","Writing"
  • Publisher
    ieee
  • Conference_Titel
    Big Data (Big Data), 2015 IEEE International Conference on
  • Type

    conf

  • DOI
    10.1109/BigData.2015.7363765
  • Filename
    7363765