• DocumentCode
    506568
  • Title

    A fast subspace partition clustering algorithm for high dimensional data streams

  • Author

    Zhang, Zhongping ; Wang, Hao

  • Author_Institution
    Coll. of Inf. Sci. & Eng., Yanshan Univ., Qinhuangdao, China
  • Volume
    1
  • fYear
    2009
  • fDate
    20-22 Nov. 2009
  • Firstpage
    491
  • Lastpage
    495
  • Abstract
    Data stream clustering is an important research problem in data stream mining. However, clustering arbitrary shapes over high dimensional data streams has not been well addressed. In this paper, we propose a fast subspace partition data streams clustering algorithm, which adopts two-phased clustering framework. In the online component, the extension of adjacent unit (E-unit), which has common edge or vertex with dense units, is presented. Moreover, the improved CD-tree lattice structure is introduced to store the information of non-empty units, maintain the position relationships among units, and keep the affiliation between dense units (D-units) and E-units. Outdated units which need to be faded are performed by decayed function, so that the corresponding microclusters are maintained dynamically. In the offline component, the final clusters are generated according to all the micro-clusters by searching D-units in radius range. Experimental results show that SPDStream has higher clustering quality than CluStream which can not generate clusters of arbitrary shapes. Furthermore, our approach has better scalability with different dimensionality and different partition granularity.
  • Keywords
    data mining; pattern clustering; tree data structures; CD-tree lattice structure; D-units; E-units; data stream clustering; data stream mining; decayed function; dense units; extension of adjacent unit; fast subspace partition clustering algorithm; high dimensional data streams; microclusters; partition granularity; two-phased clustering framework; Cities and towns; Clustering algorithms; Communications technology; Data engineering; Data mining; Educational institutions; Information science; Lattices; Partitioning algorithms; Shape; CD-Tree lattice structure; clustering; data mining; data streams; subspace partition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Computing and Intelligent Systems, 2009. ICIS 2009. IEEE International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-4754-1
  • Electronic_ISBN
    978-1-4244-4738-1
  • Type

    conf

  • DOI
    10.1109/ICICISYS.2009.5357796
  • Filename
    5357796