• DocumentCode
    2739722
  • Title

    Clustering High Dimensional Data Streams with Representative Points

  • Author

    Wang, Xiujun ; Shen, Hong

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Univ. of Sci. & Technol. of China, Hefei, China
  • Volume
    1
  • fYear
    2009
  • fDate
    14-16 Aug. 2009
  • Firstpage
    449
  • Lastpage
    453
  • Abstract
    In this paper, we propose a novel algorithm for clustering high dimensional data streams with representative data points. The fixed-size interval partitioning adopted in traditional grid based clustering methods can not capture clusters in each dimension well when they are applied in evolving high dimensional data streams. It may generate unnecessary dense grids which misrepresent clusters in a subspace. To overcome these drawbacks, we quantify each dimension (attribute) of data points separately and use the generated representative data points for each dimension instead of fixed-size intervals. These data points are updated with incoming data points continuously so that they can capture the cluster trends in each dimension more accurately than the fixed-size intervals. Instead of discarding the historical data point as a whole, our algorithm confines data discarding at attribute level with the statistics stored in the representative data points. This enables us to keep useful parts of data points and discard the trivial parts. Experiment results on synthetic and real data sets display the high effectiveness and accuracy of the proposed method.
  • Keywords
    grid computing; pattern clustering; clustering high dimensional data streams; fixed-size interval partitioning; grid based clustering methods; representative data points; Clustering algorithms; Clustering methods; Computer science; Displays; Fuzzy systems; Mesh generation; Partitioning algorithms; Probability; Shape; Statistics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2009. FSKD '09. Sixth International Conference on
  • Conference_Location
    Tianjin
  • Print_ISBN
    978-0-7695-3735-1
  • Type

    conf

  • DOI
    10.1109/FSKD.2009.341
  • Filename
    5358539