DocumentCode :
3322242
Title :
LOCUST: An Online Analytical Processing Framework for High Dimensional Classification of Data Streams
Author :
Aggarwal, Charu C. ; Yu, Philip S.
Author_Institution :
T.J. Watson Res. Center, IBM, Hawthorne, NY
fYear :
2008
fDate :
7-12 April 2008
Firstpage :
426
Lastpage :
435
Abstract :
In recent years, data streams have become ubiquitous because of advances in hardware and software technology. The ability to adapt conventional mining problems to data streams is a great challenge in a data stream environment. Many data streams are inherently high dimensional, which creates a special challenge for data mining algorithms. In this paper, we consider the problem of classification of high dimensional data streams. For the high dimensional case, even traditional classifiers do not work very well on fixed data sets. We discuss a number of insights for the intractability of the high dimensional case. We use these insights to propose a new classification method (LOCUST) which avoids many of these weaknesses. The key is to develop a subspace-based instance centered classification approach which can be implemented efficiently for a fast data stream. We propose a methodology to effectively process the data stream in an organized way, so that the intermediate data structures can be used to sample locally discriminative subspaces for the classification process. We show that LOCUST is able to work effectively in the high dimensional case, and is also flexible in terms of increased robustness with greater resource availability.
Keywords :
data mining; pattern classification; LOCUST; conventional mining problem; data mining algorithm; data stream environment; intermediate data structure; online analytical processing framework; subspace-based instance centered classification approach; Automatic testing; Availability; Classification algorithms; Classification tree analysis; Computational modeling; Data mining; Decision trees; Hardware; Knowledge based systems; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Engineering, 2008. ICDE 2008. IEEE 24th International Conference on
Conference_Location :
Cancun
Print_ISBN :
978-1-4244-1836-7
Electronic_ISBN :
978-1-4244-1837-4
Type :
conf
DOI :
10.1109/ICDE.2008.4497451
Filename :
4497451
Link To Document :
بازگشت