DocumentCode :
1791592
Title :
Large-scale logistic regression and linear support vector machines using spark
Author :
Chieh-Yen Lin ; Cheng-Hao Tsai ; Ching-Pei Lee ; Chih-Jen Lin
Author_Institution :
Dept. of Comput. Sci., Nat. Taiwan Univ., Taipei, Taiwan
fYear :
2014
fDate :
27-30 Oct. 2014
Firstpage :
519
Lastpage :
528
Abstract :
Logistic regression and linear SVM are useful methods for large-scale classification. However, their distributed implementations have not been well studied. Recently, because of the inefficiency of the MapReduce framework on iterative algorithms, Spark, an in-memory cluster-computing platform, has been proposed. It has emerged as a popular framework for large-scale data processing and analytics. In this work, we consider a distributed Newton method for solving logistic regression as well linear SVM and implement it on Spark. We carefully examine many implementation issues significantly affecting the running time and propose our solutions. After conducting thorough empirical investigations, we release an efficient and easy-to-use tool for the Spark community.
Keywords :
Newton method; data analysis; pattern classification; regression analysis; support vector machines; MapReduce framework; Spark platform; distributed Newton method; in-memory cluster-computing platform; iterative algorithm; large-scale classification; large-scale data analytics; large-scale data processing; large-scale logistic regression; linear SVM; linear support vector machines; Fault tolerance; Fault tolerant systems; Newton method; Partitioning algorithms; Sparks; Support vector machines; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data (Big Data), 2014 IEEE International Conference on
Conference_Location :
Washington, DC
Type :
conf
DOI :
10.1109/BigData.2014.7004269
Filename :
7004269
Link To Document :
بازگشت