مرکز منطقه ای اطلاع رساني علوم و فناوري - Labeling Instances in Evolving Data Streams with MapReduce

DocumentCode :

3007093

Title :

Labeling Instances in Evolving Data Streams with MapReduce

Author :

Haque, Ashraful ; Parker, Brendon ; Khan, Latifur

Author_Institution :

Dept. of Comput. Sci., Univ. of Texas at Dallas, Dallas, TX, USA

fYear :

2013

fDate :

June 27 2013-July 2 2013

Firstpage :

387

Lastpage :

394

Abstract :

Unlike traditional data mining where data is static, mining algorithms for data streams must process the data "on the fly" and update the class decision boundaries as the stream progresses to address the challenges of concept drift and feature evolution. In our current work, we have proposed a multi-tiered ensemble based fast and robust method, which rapidly learns the concepts in a data stream, predicts labels for new data with strong accuracy, and agilely tracks the dynamic changes in the evolving concepts and feature space. Bottleneck of our current work is, it needs to build ADABOOST ensemble for each numeric feature. This can face scalability issue as number of features can be very large at times in data stream. In this paper we propose a method to parallelize the independent parts of that work using a MapReduce framework. This increases scalability and achieves a significant speedup without compromising classification accuracy. We demonstrate the performance of our approach in terms of speedup, scale up and classification accuracy.

Keywords :

data mining; learning (artificial intelligence); ADABOOST ensemble; MapReduce; class decision boundary; data mining; data stream; labeling instances; multitiered ensemble; Accuracy; Data mining; Distributed databases; Indexes; Scalability; Sports equipment; Training; Evolving Data Streams; Labeling Instances; MapReduce;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data (BigData Congress), 2013 IEEE International Congress on

Conference_Location :

Santa Clara, CA

Print_ISBN :

978-0-7695-5006-0

Type :

conf

DOI :

10.1109/BigData.Congress.2013.58

Filename :

6597162

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3007093