Title :
Clustering Algorithm for High Dimensional Data Stream over Sliding Windows
Author :
Liu, Weiguo ; OuYang, Jia
Author_Institution :
Sch. of Inf. Sci. & Eng., Central South Univ., Changsha, China
Abstract :
Data stream clustering is confronted with great challenges due to the memory usages and the processing speed. Besides, lots of stream data are high-dimensional in natural and high-dimensional data are inherently more complex in clustering. This paper proposes an effective clustering algorithm referred as HSWStream for high dimensional data stream over sliding windows. This algorithm handles the high dimensional problem with projected clustering technique, deals with the in-cluster evolution with exponential histogram of cluster feature called EHCF and eliminates the influence of old points with the fading temporal cluster features. Mean- while, via the mechanism of exponential histogram, we save more information of recent data but less information of old data, which is fit for the thought of data stream evolution. The projected clustering brings higher quality of clusters and higher speed of execution, while the sliding window brings higher quality and less memory usage. In addition, in order to bring more efficiency, we use a fast computational method to main- tain EHCF. Main idea of the fast computational method indicates that we have no need to handle the new data point immediately until we should delete a FTCF in corresponding EHCF. The evolving data streams in the experiments use KDD- CUP´98 and KDD-CUP´99 real data sets and synthetic data sets. The experimental results demonstrate that proposed method is of higher quality, less memory and faster processing speed than other algorithms.
Keywords :
pattern clustering; EHCF; HSWStream; KDD- CUP´98 real data sets; KDD-CUP´99 real data sets; cluster feature exponential histogram; clustering algorithm; computational method; data stream clustering; evolving data streams; fading temporal cluster features; high dimensional data stream; high dimensional problem; in-cluster evolution; projected clustering technique; sliding windows; synthetic data sets; Algorithm design and analysis; Clustering algorithms; Data mining; Fading; Histograms; Maintenance engineering; Vectors; clustering algorithm; data stream; exponential histogram; projected clustering; sliding window;
Conference_Titel :
Trust, Security and Privacy in Computing and Communications (TrustCom), 2011 IEEE 10th International Conference on
Conference_Location :
Changsha
Print_ISBN :
978-1-4577-2135-9
DOI :
10.1109/TrustCom.2011.213