• DocumentCode
    2551809
  • Title

    Study of sampling techniques and algorithms in data stream environments

  • Author

    Hu, Wenyu ; Zhang, Baili

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Fujian Univ. of Technol., Fuzhou, China
  • fYear
    2012
  • fDate
    29-31 May 2012
  • Firstpage
    1028
  • Lastpage
    1034
  • Abstract
    Sampling is the most versatile approximation technique available and is still one of the most powerful methods for building a one-pass synopsis of a data set in a streaming environment. Throughout the detailed review, a kind of taxonomic frame of sampling algorithms was presented; meanwhile, discussions and comparisons of representative sampling algorithms were performed. Due to the limitations of uniform sampling in some applications, the importance of using biased sampling methods in these scenarios was fully dissertated. Subsequently, we surveyed the application and development of sampling techniques, especially those traditional sampling techniques in data stream model. Finally, we discussed the research challenges and future directions of sampling problem in the context of data streams.
  • Keywords
    data mining; data structures; sampling methods; approximation technique; biased sampling method; data stream mining; data structure; one-pass synopsis; representative sampling algorithm; taxonomic frame; Algorithm design and analysis; Approximation algorithms; Approximation methods; Classification algorithms; Data mining; Heuristic algorithms; Reservoirs; biased sampling; data stream mining; data-stream sampling; synopsis data structure; uniform sampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
  • Conference_Location
    Sichuan
  • Print_ISBN
    978-1-4673-0025-4
  • Type

    conf

  • DOI
    10.1109/FSKD.2012.6234278
  • Filename
    6234278