• DocumentCode
    9223
  • Title

    Elastic Scaling for Data Stream Processing

  • Author

    Gedik, Bugra ; Schneider, Scott ; Hirzel, Martin ; Kun-Lung Wu

  • Author_Institution
    Comput. Eng. Dept., Bilkent Univ., Ankara, Turkey
  • Volume
    25
  • Issue
    6
  • fYear
    2014
  • fDate
    Jun-14
  • Firstpage
    1447
  • Lastpage
    1463
  • Abstract
    This article addresses the profitability problem associated with auto-parallelization of general-purpose distributed data stream processing applications. Auto-parallelization involves locating regions in the application´s data flow graph that can be replicated at run-time to apply data partitioning, in order to achieve scale. In order to make auto-parallelization effective in practice, the profitability question needs to be answered: How many parallel channels provide the best throughput? The answer to this question changes depending on the workload dynamics and resource availability at run-time. In this article, we propose an elastic auto-parallelization solution that can dynamically adjust the number of channels used to achieve high throughput without unnecessarily wasting resources. Most importantly, our solution can handle partitioned stateful operators via run-time state migration, which is fully transparent to the application developers. We provide an implementation and evaluation of the system on an industrial-strength data stream processing platform to validate our solution.
  • Keywords
    data analysis; data flow graphs; parallel processing; profitability; application data flow graph; auto-parallelization; data partitioning; elastic scaling; general-purpose distributed data stream processing applications; industrial-strength data stream processing platform; parallel channels; profitability problem; resource availability; run-time state migration; workload dynamics; Availability; Indexes; Measurement; Parallel processing; Runtime; Safety; Throughput; Data stream processing; elasticity; parallelization;
  • fLanguage
    English
  • Journal_Title
    Parallel and Distributed Systems, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9219
  • Type

    jour

  • DOI
    10.1109/TPDS.2013.295
  • Filename
    6678504