DocumentCode :
3439855
Title :
A Multi Density-Based Clustering Algorithm for Data Stream with Noise
Author :
Amini, Amin ; Saboohi, Hadi ; Teh Ying Wah
Author_Institution :
Dept. of Inf. Syst., Univ. of Malaya, Kuala Lumpur, Malaysia
fYear :
2013
fDate :
7-10 Dec. 2013
Firstpage :
1105
Lastpage :
1112
Abstract :
Density-based clustering can detect arbitrary shape clusters, handle outliers and do not need the number of clusters in advance. However, they cannot work properly in multi density environments. The existing multi density clustering algorithms have some problems in order to be applicable for data streams such as the need of whole data to perform clustering, two-pass clustering and high execution time. Data stream arrives continuously and they have to be processed in limited time and memory. Therefore, we need an algorithm to cluster data stream with different densities as well as to overcome the challenges in clustering data streams. In this paper, we introduce a Multi-Density clustering algorithm for data stream called MuDi-Stream. MuDi-Stream is an online-offline clustering algorithm, in which the online phase forms core-mini-clusters using a new proposed core distance and offline phase clusters the core-mini-clusters based on a density-based method. The new core distance called mini core distance is calculated based on the number of neighboring data points around the core. Therefore, the algorithm has different core distances for different clusters that leads to cover multi density environments. A novel pruning strategy is also used to filter out the real data from the noise by mapping the outliers in the grid. The grid structure keeps the neighbors of the data point to determine mini-core distance and remove noise effectively. Our performance study over synthetic data sets demonstrates effectiveness of our method.
Keywords :
data handling; data mining; pattern clustering; MuDi-Stream; arbitrary shape cluster detection; core-miniclusters; data stream clustering; data stream mining; high execution time; mini core distance; multi density-based clustering algorithm; neighboring data points; noise removal; novel pruning strategy; online-offline clustering algorithm; outlier handling; outlier mapping; two-pass clustering; Clustering algorithms; Data mining; Fading; Indium phosphide; Noise; Partitioning algorithms; Shape; Evolving data streams; core-mini-cluster; density-based clustering; mini-core distance; multi-density;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4799-3143-9
Type :
conf
DOI :
10.1109/ICDMW.2013.170
Filename :
6754048
Link To Document :
بازگشت