DocumentCode :
3007093
Title :
Labeling Instances in Evolving Data Streams with MapReduce
Author :
Haque, Ashraful ; Parker, Brendon ; Khan, Latifur
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas, Dallas, TX, USA
fYear :
2013
fDate :
June 27 2013-July 2 2013
Firstpage :
387
Lastpage :
394
Abstract :
Unlike traditional data mining where data is static, mining algorithms for data streams must process the data "on the fly" and update the class decision boundaries as the stream progresses to address the challenges of concept drift and feature evolution. In our current work, we have proposed a multi-tiered ensemble based fast and robust method, which rapidly learns the concepts in a data stream, predicts labels for new data with strong accuracy, and agilely tracks the dynamic changes in the evolving concepts and feature space. Bottleneck of our current work is, it needs to build ADABOOST ensemble for each numeric feature. This can face scalability issue as number of features can be very large at times in data stream. In this paper we propose a method to parallelize the independent parts of that work using a MapReduce framework. This increases scalability and achieves a significant speedup without compromising classification accuracy. We demonstrate the performance of our approach in terms of speedup, scale up and classification accuracy.
Keywords :
data mining; learning (artificial intelligence); ADABOOST ensemble; MapReduce; class decision boundary; data mining; data stream; labeling instances; multitiered ensemble; Accuracy; Data mining; Distributed databases; Indexes; Scalability; Sports equipment; Training; Evolving Data Streams; Labeling Instances; MapReduce;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5006-0
Type :
conf
DOI :
10.1109/BigData.Congress.2013.58
Filename :
6597162
Link To Document :
بازگشت