DocumentCode :
3704161
Title :
Scalable Learning of k-dependence Bayesian Classifiers under MapReduce
Author :
Jacinto Arias;José A. Gámez;José M.
Author_Institution :
Dept. of Comput. Syst., Univ. of Castilla-La Mancha, Albacete, Spain
Volume :
2
fYear :
2015
Firstpage :
25
Lastpage :
32
Abstract :
In Data Mining there is a constant need to provide more scalable tools in order to tackle new domains with an increased level of complexity. Over the last few years one of the main challenges in this field is the growing size of the available data, owing to the level of data generation and storage capacities provided by new emergent technology, a range of new computational paradigms and parallel architectures have been proposed. MapReduce got the leading role in the field of Big Data applications since its appearance, and many popular Data Analysis tools and techniques have been successfully adapted to this paradigm. Supervised classification is one of the most common problems in Data Mining, and Bayesian Networks Classifiers (BNC) have become one of the most extended and competitive techniques to approach them. In this paper we propose a parallel definition of the KDB (k-dependence Bayesian classifier) algorithm under the MapReduce framework. We focus on obtaining maximum scalability and flexibility by exploring the concepts of vertical and horizontal parallelism, thus addressing both Big Data and High Dimensional problems simultaneously. We analyse its properties and the advantages of applying it to large datasets of different nature. Finally, an experimental evaluation is performed by testing a Hadoop implementation of our proposal on a high-end cluster of computers.
Keywords :
"Bayes methods","Data mining","Complexity theory","Big data","Niobium","Algorithm design and analysis","Scalability"
Publisher :
ieee
Conference_Titel :
Trustcom/BigDataSE/ISPA, 2015 IEEE
Type :
conf
DOI :
10.1109/Trustcom.2015.558
Filename :
7345471
Link To Document :
بازگشت