Title :
Meta-learning for large scale machine learning with MapReduce
Author :
Xuan Liu ; Xiaoguang Wang ; Matwin, S. ; Japkowicz, Nathalie
Author_Institution :
Sch. of EECS, Univ. of Ottawa, Ottawa, ON, Canada
Abstract :
We have entered the big data age. Knowledge extraction from massive data is becoming more and more rewarding and urgent. MapReduce has provided a feasible framework for programming machine learning algorithms in Map and Reduce functions. The relatively simple programming interface has helped to solve machine learning algorithms´ scalability problems. However, this framework suffers from an obvious weakness: it does not support iterations. This makes those algorithms requiring iterations difficult to fully explore the efficiency of MapReduce. In this paper, we propose to apply Meta-learning programmed with MapReduce to avoid parallelizing machine learning algorithms while also improving their scalability to big datasets. The experiments conducted on Hadoop fully distributed mode on Amazon EC2 demonstrate that our algorithm PML reduces the training computational complexity significantly when the number of computing nodes increases while gaining smaller error rates than those on one single node. The comparison of PML with the contemporary parallelized AdaBoost algorithm: AdaBoost.PL shows that PML has lower error rates.
Keywords :
Big Data; application program interfaces; knowledge acquisition; learning (artificial intelligence); parallel programming; Amazon EC2; Map function; MapReduce; PML; Reduce function; big dataset scalability improvement; computational complexity; computing nodes; error rates; fully-distributed Hadoop mode; knowledge extraction; large-scale machine learning algorithm programming; machine learning algorithm scalability problems; massive data; meta-learning; programming interface; Algorithm design and analysis; Classification algorithms; Computational modeling; Error analysis; Machine learning algorithms; Training; Training data; Adaboost; MapReduce; big data; meta-learning; parallel computing;
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
DOI :
10.1109/BigData.2013.6691741