Title :
A Framework for Clustering Uncertain Data Streams
Author :
Aggarwal, Charu C. ; Yu, Philip S.
Author_Institution :
T.J. Watson Res. Center, IBM, Hawthorne, NY
Abstract :
In recent years, uncertain data management applications have grown in importance because of the large number of hardware applications which measure data approximately. For example, sensors are typically expected to have considerable noise in their readings because of inaccuracies in data retrieval, transmission, and power failures. In many cases, the estimated error of the underlying data stream is available. This information is very useful for the mining process, since it can be used in order to improve the quality of the underlying results. In this paper we will propose a method for clustering uncertain data streams. We use a very general model of the uncertainty in which we assume that only a few statistical measures of the uncertainty are available. We will show that the use of even modest uncertainty information during the mining process is sufficient to greatly improve the quality of the underlying results. We show that our approach is more effective than a purely deterministic method such as the CluStream approach. We will test the approach on a variety of real and synthetic data sets and illustrate the advantages of the method in terms of effectiveness and efficiency.
Keywords :
data mining; statistical analysis; CluStream approach; mining process; uncertain data management applications; uncertain data streams clustering; uncertainty statistical measures; Cleaning; Data mining; Data privacy; Hardware; Information retrieval; Measurement uncertainty; Probability density function; Probability distribution; Statistical analysis; Testing;
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
DOI :
10.1109/ICDE.2008.4497423