DocumentCode :
633063
Title :
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
Author :
Yui, Makoto ; Kojima, Isao
Author_Institution :
Inf. Technol. Res. Inst., Nat. Inst. of Adv. Ind. Sci. & Technol., Tsukuba, Japan
fYear :
2013
fDate :
June 27 2013-July 2 2013
Firstpage :
1
Lastpage :
8
Abstract :
There are two popular schools of thought for performing large-scale machine learning that does not fit into memory. One is to run machine learning within a relational database management system, and the other is to push analytical functions into MapReduce. As each approach has its own set of pros and cons, we propose a database-Hadoop hybrid approach to scalable machine learning where batch-learning is performed on the Hadoop platform, while incremental-learning is performed on PostgreSQL. We propose a purely relational approach that removes the scalability limitation of previous approaches based on user-defined aggregates and also discuss issues and resolutions in applying the proposed approach to Hadoop/Hive. Experimental evaluations of classification performance and training speed were conducted using a commercial advertisement dataset provided in the KDD Cup 2012, Track 2. The experimental results show that our scheme has competitive classification performance and superior training speed compared with state-of-the-art scalable machine learning frameworks, 5 and 7.65 times faster than Vow pal Wabbit and Bismarck, respectively, for a regression task.
Keywords :
learning (artificial intelligence); parallel programming; regression analysis; relational databases; PostgreSQL; batch learning; classification performance; database-Hadoop hybrid approach; incremental learning; purely relational approach; regression task; relational database management system; scalable machine learning; user-defined aggregates; Aggregates; Computational modeling; Machine learning algorithms; Predictive models; Relational databases; Training; Hadoop; MapReduce; in-database analytics; iterative parameter mixture; logistic regression; stochastic gradient descent;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (BigData Congress), 2013 IEEE International Congress on
Conference_Location :
Santa Clara, CA
Print_ISBN :
978-0-7695-5006-0
Type :
conf
DOI :
10.1109/BigData.Congress.2013.10
Filename :
6597112
Link To Document :
بازگشت