• DocumentCode
    480189
  • Title

    Efficiently Filtering Duplicates over Distributed Data Streams

  • Author

    Wang, Xiaowei ; Zhang, Qiang ; Jia, Yan

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Defense Technol., Changsha
  • Volume
    4
  • fYear
    2008
  • fDate
    12-14 Dec. 2008
  • Firstpage
    631
  • Lastpage
    634
  • Abstract
    We study the problem of filtering duplicate items over physically distributed data streams to provide clean data for real-time monitoring applications. Existing approaches only filter local duplicates within each stream, and their space and time costs are hardly feasible for high-speed data streams. Based on the space/time efficient data structure Bloom filter, we propose a novel local filtering algorithm to efficiently filter local duplicates, and then extend it to global duplicates filtering which is never addressed before. To adapt to different additional communication overhead in global duplicates filtering, we present eager and lazy approaches for Bloom filter sharing. Theoretical and experimental results show that our solution can efficiently filter duplicates locally and globally, while the errors are small enough when the arguments are set properly.
  • Keywords
    data structures; distributed databases; random processes; Bloom filter sharing; distributed data stream; global duplicate filtering; random process; real-time monitoring application; space/time data structure; Aggregates; Computer science; Costs; Data structures; Distributed computing; Filtering algorithms; Filters; Sensor phenomena and characterization; Software engineering; Space technology; Bloom filter; distributed data stream; duplicate items;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Software Engineering, 2008 International Conference on
  • Conference_Location
    Wuhan, Hubei
  • Print_ISBN
    978-0-7695-3336-0
  • Type

    conf

  • DOI
    10.1109/CSSE.2008.1367
  • Filename
    4722698