DocumentCode :
2984419
Title :
Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems
Author :
Hsiang-Fu Yu ; Cho-Jui Hsieh ; Si si ; Dhillon, I.
Author_Institution :
Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
fYear :
2012
fDate :
10-13 Dec. 2012
Firstpage :
765
Lastpage :
774
Abstract :
Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD) are two popular approaches to compute matrix factorization. There has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been explored thoroughly. In this paper, we show that coordinate descent based methods have a more efficient update rule compared to ALS, and are faster and have more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updatesrank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, on a synthetic dataset with 2 billion ratings, CCD++ is 4 times faster than both SGD and ALS using a distributed system with 20 machines.
Keywords :
computational complexity; convergence; gradient methods; matrix decomposition; optimisation; parallel algorithms; recommender systems; stochastic processes; ALS; CCD++ algorithm; SGD; alternating least squares; classical optimization approach; coordinate descent based methods; cubic time complexity; distributed systems; large-scale datasets; large-scale problems; multicore systems; parallel algorithms; parallel matrix factorization; recommender systems; scalable coordinate descent; slow convergence; stable convergence; stochastic gradient descent; synthetic dataset; update rule; update sequences; updates rank-one factors; web-scale datasets; Acceleration; Charge coupled devices; Convergence; Least squares approximation; Recommender systems; Time complexity; Low rank approximation; Matrix factorization; Parallelization; Recommender systems;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2012 IEEE 12th International Conference on
Conference_Location :
Brussels
ISSN :
1550-4786
Print_ISBN :
978-1-4673-4649-8
Type :
conf
DOI :
10.1109/ICDM.2012.168
Filename :
6413853
Link To Document :
بازگشت