Title :
Research in data stream clustering based on Gaussian Mixture Model Genetic Algorithm
Author :
Gao, Ming-ming ; Tai-Hua, Chang ; Gao, Xiang-xiang
Author_Institution :
School of Control and Computer Engineering, North China Electric Power University, Beijing, China
Abstract :
Clustering data streams is one of the important branches in mining data streams. Because of dynamic and massive characteristics of data streams, traditional data mining algorithms could not satisfy the requirement of online analysis and the appropriate value of number of clusters. The focus on data stream technologies is to design one-pass scan data set, and maintain an effective data structure in memory incrementally which is far smaller than the size of whole data set. In the paper proposes a new feature mining method named Gaussian Mixture Model with Genetic Algorithm (GMMGA), based on an extending method of Gaussian mixture model. This method is use a probability density based data stream clustering which requires only the newly arrived data, not the entire historical data. The GMMGA algorithm can determine the number of Gaussian clusters and the parameters of each Gaussian component through random split and merge operation of Genetic Algorithm. In the GMMGA, a function was made to threshold value to clusters to reduce the bad clusters effect on the clustering result. In this algorithm, it can improve the robustness and accuracy of the clustering numbers, also can save memory and run time. Experimental results show that the method is effective and has higher clustering precision compared with conventional STREAM algorithm and CluStream algorithm.
Keywords :
Adaptation model; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Data mining; Genetics; Training; Data stream clustering; Gaussian Mixture Model Genetic Algorithm; split and merge;
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
DOI :
10.1109/ICISE.2010.5691641