DocumentCode :
3363502
Title :
M-kernel merging: towards density estimation over data streams
Author :
Zhou, Aoying ; Cai, Zhiyuan ; Wei, Li ; Qian, Weining
Author_Institution :
Dept. of Comput. Sci. & Eng., Fudan Univ., China
fYear :
2003
fDate :
26-28 March 2003
Firstpage :
285
Lastpage :
292
Abstract :
Density estimation is a costly operation for computing distribution information of data sets underlying many important data mining applications, such as clustering and biased sampling. However, traditional density estimation methods are inapplicable for streaming data, which are continuously arriving large volume of data, because of their request for linear storage and square size calculation. The shortcoming limits the application of many existing effective algorithms on data streams, for which the mining problem is an emergency for applications and a challenge for research. In this paper, the problem of computing density functions over data streams is examined. A novel method attacking this shortcoming of existing methods is developed to enable density estimation for large volume of data in linear time, fixed size memory, and without lose of accuracy. The method is based on M-Kernel merging, so that limited kernel functions to be maintained are determined intelligently, The application of the new method on different streaming data models is discussed, and the result of intensive experiments is presented. The analytical and empirical result show that this new density estimation algorithm for data streams can calculate density functions on demand at any time with high accuracy for different streaming data models.
Keywords :
data mining; data models; merging; M-Kernel merging; biased sampling; clustering; data mining applications; data sets; density estimation; distribution information; limited kernel functions; linear storage; square size calculation; streaming data; Algorithm design and analysis; Application software; Computer science; Data engineering; Data mining; Data models; Density functional theory; Distributed computing; Information processing; Laboratories;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings. Eighth International Conference on
Conference_Location :
Kyoto, Japan
Print_ISBN :
0-7695-1895-8
Type :
conf
DOI :
10.1109/DASFAA.2003.1192393
Filename :
1192393
Link To Document :
بازگشت