• DocumentCode
    3123171
  • Title

    Supporting Generic Cost Models for Wide-Area Stream Processing

  • Author

    Papaemmanouil, Olga ; Cetintemel, U. ; Jannotti, John

  • Author_Institution
    Deparment of Comput. Sci., Brandeis Univ., Waltham, MA
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    1084
  • Lastpage
    1095
  • Abstract
    Existing stream processing systems are optimized for a specific metric, which may limit their applicability to diverse applications and environments. This paper presents XFlow, a generic data stream collection, processing, and dissemination system that addresses this limitation efficiently. XFlow can express and optimize a variety of optimization metrics and constraints by distributing stream processing queries across a wide-area network. It uses metric-independent decentralized algorithms that work on localized, aggregated statistics, while avoiding local optima. To facilitate light-weight dynamic changes on the query deployment, XFlow relies on a loosely-coupled, flexible architecture consisting of multiple publish-subscribe overlay trees that can gracefully scale and adapt to changes to network and workload conditions. Based on the desired performance goals, the system progressively refines the query deployment, the structure of the overlay trees, as well as the statistics collection process. We provide an overview of XFlow´s architecture and discuss its decentralized optimization model. We demonstrate its flexibility and the effectiveness using real-world streams and experimental results obtained from XFlow´s deployment on PlanetLab. The experiments reveal that XFlow can effectively optimize various performance metrics in the presence of varying network and workload conditions.
  • Keywords
    middleware; query processing; statistical analysis; trees (mathematics); PlanetLab; XFlow; aggregated statistics; data dissemination system; data processing; data stream collection; decentralized optimization model; generic cost models; metric-independent decentralized algorithms; optimization metrics; publish-subscribe overlay trees; query deployment; statistics collection process; wide-area stream processing; Application software; Computer science; Constraint optimization; Costs; Data engineering; Logic; Monitoring; Peer to peer computing; Statistical distributions; USA Councils; Overlay Networks; Publish Subscribe; Stream Processing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.11
  • Filename
    4812479