DocumentCode :
3007311
Title :
Distributed Stochastic Aware Random Forests -- Efficient Data Mining for Big Data
Author :
Assuncao, Jose ; Fernandes, Paulo ; Lopes, Luis ; Normey, Silvio
Author_Institution :
Comput. Sci. Dept., PUCRS Univ., Porto Alegre, Brazil
fYear :
2013
fDate :
June 27 2013-July 2 2013
Firstpage :
425
Lastpage :
426
Abstract :
Some top data mining algorithms, as ensemble classifiers, may be inefficient to very large data set. This paper makes an initial proposal of a distributed ensemble classifier algorithm based on the popular Random Forests for Big Data. The proposed algorithm aims to improve the efficiency of the algorithm by a distributed processing model called MapReduce. At the same time, our proposed algorithm aims to reduce the randomness impact by following an algorithm called Stochastic Aware Random Forests - SARF.
Keywords :
data mining; distributed processing; pattern classification; MapReduce; SARF; big data; data mining algorithm; distributed ensemble classifier algorithm; distributed processing model; distributed stochastic aware random forest; Data handling; Data mining; Data models; Data storage systems; Information management; Proposals; Stochastic processes; Big Data; Data Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5006-0
Type :
conf
DOI :
10.1109/BigData.Congress.2013.68
Filename :
6597172
Link To Document :
بازگشت