• DocumentCode
    3124255
  • Title

    CoTS: A Scalable Framework for Parallelizing Frequency Counting over Data Streams

  • Author

    Das, Sudipto ; Antony, Shyam ; Agrawal, Divyakant ; El Abbadi, Amr

  • Author_Institution
    Dept. of Comput. Sci., Univ. of California, Santa Barbara, Santa Barbara, CA
  • fYear
    2009
  • fDate
    March 29 2009-April 2 2009
  • Firstpage
    1323
  • Lastpage
    1326
  • Abstract
    Frequency counting, frequent elements and top-k queries form a class of operators that are used for a wide range of stream analysis applications. In spite of the abundance of these algorithms, all known techniques for answering data stream queries are sequential in nature. The imminent ubiquity of chip multi-processor (CMP) architectures requires algorithms that can exploit the parallelism of such architectures. In this paper, we first evaluate different naive techniques for intra-operator parallelism, and summarize the insights obtained from the naive techniques. Our experimental analysis of the naive designs shows that intra-operator parallelism is not straightforward and requires a complete redesign of the system. We then propose an efficient and scalable framework for parallelizing frequency counting, frequent elements and top-k queries over data streams. The proposed CoTS (co-operative thread scheduling) framework is based on the principle of threads co-operating rather than contending. Our experiments on a state-of-the-art quad-core chip-multiprocessor architecture and synthetic data sets demonstrate the scalability of the proposed framework, and the efficiency is demonstrated by peak processing throughput of more than 60 million elements per second.
  • Keywords
    parallel architectures; processor scheduling; CMP; CoTS; chip multi-processor; cooperative thread scheduling; data streams; intra-operator parallelism; parallelizing frequency counting; peak processing throughput; stream analysis applications; synthetic data sets; top-k queries; Computer science; Counting circuits; Data engineering; Frequency; Monitoring; Parallel processing; Scalability; Throughput; USA Councils; Yarn;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2009. ICDE '09. IEEE 25th International Conference on
  • Conference_Location
    Shanghai
  • ISSN
    1084-4627
  • Print_ISBN
    978-1-4244-3422-0
  • Electronic_ISBN
    1084-4627
  • Type

    conf

  • DOI
    10.1109/ICDE.2009.231
  • Filename
    4812531