Title :
MapReduce Based Frameworks for Classifying Evolving Data Stream
Author :
Haque, Ashraful ; Khan, Latifur
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Dallas Richardson, Dallas, TX, USA
Abstract :
Data Stream mining has some inherent challenges which are not present in traditional data mining. Stream data classification is a challenging problem because of two important properties: its infinite length and evolving nature. In our current work, we have proposed a multi-tiered ensemble based method to address these challenges of Data Stream mining. Though, it is a fast and robust method, it needs to build large number of AdaBoost ensembles for each numeric feature after receiving each new data chunk. Thus, it can face scalability issue especially when size of the data chunk is large or the number of numeric attributes is high. To address this problem, we propose two different approaches to form these large number of AdaBoost ensembles with MapReduce based parallelism. Each of these approaches help our base method to achieve significant scalability without compromising classification accuracy. We compare these approaches from different aspects of design. We also demonstrate and compare performance of the approaches on benchmark datasets in terms of execution time, speedup and scale up.
Keywords :
data mining; learning (artificial intelligence); parallel programming; pattern classification; AdaBoost ensembles; MapReduce based parallelism; MapReduce-based frameworks; base method; benchmark datasets; data chunk; data mining; data stream mining; evolving data stream classification; evolving nature properties; execution time; infinite length properties; multitiered ensemble-based method; numeric feature; scalability issue; scaleup factor; speedup factor; Accuracy; Buildings; Data mining; Distributed databases; Indexes; Scalability; Training; Distributed Processing; Evolving Data Stream; MapReduce; Scalability;
Conference_Titel :
Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on
Conference_Location :
Dallas, TX
Print_ISBN :
978-1-4799-3143-9
DOI :
10.1109/ICDMW.2013.145