DocumentCode :
3588745
Title :
Budgeted mini-batch parallel gradient descent for support vector machines on Spark
Author :
Hang Tao ; Bin Wu ; Xiuqin Lin
Author_Institution :
Sch. of Comput. Sci., Beijing Univ. of Posts & Telecommun., Beijing, China
fYear :
2014
Firstpage :
945
Lastpage :
950
Abstract :
Mini-batch gradient descent (MBGD) is an attractive choice for support vector machines (SVM), because processing part of examples at a time is advantageous when disposing large data. Similar to other SVM learning algorithms, MBGD is vulnerable to the curse of kernelization when equipped with kernel functions, which results in unbounded linear growth in model size and update time with data size. This paper presents a budgeted mini-batch parallel gradient descent algorithm (BMBPGD) for large-scale kernel SVM training which can run efficiently on Apache Spark. Spark is a fast and general engine for large-scale data processing which is originally intended to deal with iterative algorithms. BMBPGD algorithm has constant space and time complexity per update. It uses removal budget maintenance method to keep the number of support vectors (SVs). The experiment results show that BMBPGD achieves higher accuracy than SVMWithSGD algorithm in MLlib on Spark environment, and it takes much shorter time than LibSVM.
Keywords :
computational complexity; data analysis; gradient methods; parallel algorithms; support vector machines; Apache Spark; BMBPGD; MLlib; budgeted minibatch parallel gradient descent algorithm; constant space complexity; constant time complexity; curse-of-kernelization; data size; kernel functions; large-scale data processing; large-scale kernel SVM training; model size; removal budget maintenance method; support vector machines; unbounded linear growth; update time; Accuracy; Algorithm design and analysis; Complexity theory; Sparks; Support vector machines; Spark; budget maintenance; kernel method; large-scale learning; mini-batch gradient descent; stochastic gradient descent; support vector machines;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Parallel and Distributed Systems (ICPADS), 2014 20th IEEE International Conference on
Type :
conf
DOI :
10.1109/PADSW.2014.7097914
Filename :
7097914
Link To Document :
بازگشت