• DocumentCode
    473303
  • Title

    Query-Aware Partitioning for Monitoring Massive Network Data Streams

  • Author

    Johnson, Theodore ; Muthukrishnan, S. ; Shkapenyuk, Vladislav ; Spatscheck, Oliver

  • Author_Institution
    AT&T Labs. - Res., Florham Park, NJ
  • fYear
    2008
  • fDate
    7-12 April 2008
  • Firstpage
    1528
  • Lastpage
    1530
  • Abstract
    Data stream management systems (DSMS) are gaining acceptance for applications that need to process very large volumes of data in real time. The load generated by such applications frequently exceeds by far the computation capabilities of a single centralized server. In particular, a single-server instance of our DSMS, Gigascope, cannot keep up with the processing demands of the new OC-786 networks, which can generate more than 100 million packets per second. In this paper, we explore a mechanism for the distributed processing of very high speed data streams. Existing distributed DSMSs employ two mechanisms for distributing the load across the participating machines: partitioning of the query execution plans and partitioning of the input data stream in a query-independent fashion. However, for a large class of queries, both approaches fail to reduce the load as compared to centralized system, and can even lead to an increase in the load. In this paper we present an alternative approach - query-aware data stream partitioning that allows for more efficient scaling. We have developed methods for analyzing any given query node to determine a partition strategy, reconcile potentially conflicting requirements that different queries in a query set place on partitioning, and to choose an optimal partitioning which minimizes overall communication costs..
  • Keywords
    distributed processing; query processing; Gigascope; OC-786 networks; data stream management systems; distributed processing; input data stream; monitoring massive network data streams; query execution plans; query-aware data stream partitioning; query-aware partitioning; single-server instance; very high speed data streams; very large volumes; Computer network management; Computer networks; Condition monitoring; Cost function; Distributed processing; Feeds; Network servers; Real time systems; Telecommunication traffic; Web server;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
  • Conference_Location
    Cancun
  • Print_ISBN
    978-1-4244-1836-7
  • Electronic_ISBN
    978-1-4244-1837-4
  • Type

    conf

  • DOI
    10.1109/ICDE.2008.4497612
  • Filename
    4497612