• DocumentCode
    2533245
  • Title

    Approximately Detecting Duplicates for Probabilistic Data Streams over Sliding Windows

  • Author

    Wang, Xiujun ; Shen, Hong

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Sci. & Technol. of China, Hefei, China
  • fYear
    2010
  • fDate
    18-20 Dec. 2010
  • Firstpage
    263
  • Lastpage
    268
  • Abstract
    Abstract-A probabilistic data stream S is defined as a sequence of uncertain tuples <;ti, pi >;, i = 1...∞, with the semantics that element ti occurs in the stream with probability pi ϵ (0, 1). Thus each distinct element t, which occurs in tuples of S, has an existential probability based on the tuples: <; ti = t, pi >; ϵ S. Existing duplicate detection methods for a traditional deterministic data stream can´t maintain these existential probabilities for elements in S, which is important query information. In this paper, we present a novel data structure, Floating Counter Bloom Filter (FCBF), as an extension of CBF, which can maintain these existential probabilities effectively. Based on FCBF, we present an efficient algorithm to approximately detect duplicates for probabilistic data streams over sliding windows. Given a sliding window size W and floating counter number N, for any t which occurs in the past sliding window, our method outputs the accurate existential probability of t with probability 1-(1/2)ln(2)*N/W. Our experimental results on the synthetic data verify the effectiveness of our approach.
  • Keywords
    data structures; probability; query processing; replicated databases; duplicate detection methods; floating counter bloom filter; probabilistic data streams; query information; sliding windows; Accuracy; Approximation algorithms; Data structures; Filtering algorithms; Probabilistic logic; Probes; Radiation detectors; Counting Bloom Filter; Duplicate Detection; False Positive; Floating Counter Bloom Filter; Probabilistic Data Stream;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Parallel Architectures, Algorithms and Programming (PAAP), 2010 Third International Symposium on
  • Conference_Location
    Dalian
  • Print_ISBN
    978-1-4244-9482-8
  • Type

    conf

  • DOI
    10.1109/PAAP.2010.16
  • Filename
    5715092