مرکز منطقه ای اطلاع رساني علوم و فناوري - Meta-learning for large scale machine learning with MapReduce

DocumentCode :

659592

Title :

Meta-learning for large scale machine learning with MapReduce

Author :

Xuan Liu ; Xiaoguang Wang ; Matwin, S. ; Japkowicz, Nathalie

Author_Institution :

Sch. of EECS, Univ. of Ottawa, Ottawa, ON, Canada

fYear :

2013

fDate :

6-9 Oct. 2013

Firstpage :

105

Lastpage :

110

Abstract :

We have entered the big data age. Knowledge extraction from massive data is becoming more and more rewarding and urgent. MapReduce has provided a feasible framework for programming machine learning algorithms in Map and Reduce functions. The relatively simple programming interface has helped to solve machine learning algorithms´ scalability problems. However, this framework suffers from an obvious weakness: it does not support iterations. This makes those algorithms requiring iterations difficult to fully explore the efficiency of MapReduce. In this paper, we propose to apply Meta-learning programmed with MapReduce to avoid parallelizing machine learning algorithms while also improving their scalability to big datasets. The experiments conducted on Hadoop fully distributed mode on Amazon EC2 demonstrate that our algorithm PML reduces the training computational complexity significantly when the number of computing nodes increases while gaining smaller error rates than those on one single node. The comparison of PML with the contemporary parallelized AdaBoost algorithm: AdaBoost.PL shows that PML has lower error rates.

Keywords :

Big Data; application program interfaces; knowledge acquisition; learning (artificial intelligence); parallel programming; Amazon EC2; Map function; MapReduce; PML; Reduce function; big dataset scalability improvement; computational complexity; computing nodes; error rates; fully-distributed Hadoop mode; knowledge extraction; large-scale machine learning algorithm programming; machine learning algorithm scalability problems; massive data; meta-learning; programming interface; Algorithm design and analysis; Classification algorithms; Computational modeling; Error analysis; Machine learning algorithms; Training; Training data; Adaboost; MapReduce; big data; meta-learning; parallel computing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Big Data, 2013 IEEE International Conference on

Conference_Location :

Silicon Valley, CA

Type :

conf

DOI :

10.1109/BigData.2013.6691741

Filename :

6691741

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=659592