• DocumentCode
    3439855
  • Title

    A Multi Density-Based Clustering Algorithm for Data Stream with Noise

  • Author

    Amini, Amin ; Saboohi, Hadi ; Teh Ying Wah

  • Author_Institution
    Dept. of Inf. Syst., Univ. of Malaya, Kuala Lumpur, Malaysia
  • fYear
    2013
  • fDate
    7-10 Dec. 2013
  • Firstpage
    1105
  • Lastpage
    1112
  • Abstract
    Density-based clustering can detect arbitrary shape clusters, handle outliers and do not need the number of clusters in advance. However, they cannot work properly in multi density environments. The existing multi density clustering algorithms have some problems in order to be applicable for data streams such as the need of whole data to perform clustering, two-pass clustering and high execution time. Data stream arrives continuously and they have to be processed in limited time and memory. Therefore, we need an algorithm to cluster data stream with different densities as well as to overcome the challenges in clustering data streams. In this paper, we introduce a Multi-Density clustering algorithm for data stream called MuDi-Stream. MuDi-Stream is an online-offline clustering algorithm, in which the online phase forms core-mini-clusters using a new proposed core distance and offline phase clusters the core-mini-clusters based on a density-based method. The new core distance called mini core distance is calculated based on the number of neighboring data points around the core. Therefore, the algorithm has different core distances for different clusters that leads to cover multi density environments. A novel pruning strategy is also used to filter out the real data from the noise by mapping the outliers in the grid. The grid structure keeps the neighbors of the data point to determine mini-core distance and remove noise effectively. Our performance study over synthetic data sets demonstrates effectiveness of our method.
  • Keywords
    data handling; data mining; pattern clustering; MuDi-Stream; arbitrary shape cluster detection; core-miniclusters; data stream clustering; data stream mining; high execution time; mini core distance; multi density-based clustering algorithm; neighboring data points; noise removal; novel pruning strategy; online-offline clustering algorithm; outlier handling; outlier mapping; two-pass clustering; Clustering algorithms; Data mining; Fading; Indium phosphide; Noise; Partitioning algorithms; Shape; Evolving data streams; core-mini-cluster; density-based clustering; mini-core distance; multi-density;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
  • Conference_Location
    Dallas, TX
  • Print_ISBN
    978-1-4799-3143-9
  • Type

    conf

  • DOI
    10.1109/ICDMW.2013.170
  • Filename
    6754048