Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

Author

Hsiang-Fu Yu ; Cho-Jui Hsieh ; Si si ; Dhillon, I.

Author_Institution

Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA

fYear

2012

fDate

10-13 Dec. 2012

Firstpage

765

Lastpage

774

Abstract

Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD) are two popular approaches to compute matrix factorization. There has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been explored thoroughly. In this paper, we show that coordinate descent based methods have a more efficient update rule compared to ALS, and are faster and have more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updatesrank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, on a synthetic dataset with 2 billion ratings, CCD++ is 4 times faster than both SGD and ALS using a distributed system with 20 machines.

Keywords

computational complexity; convergence; gradient methods; matrix decomposition; optimisation; parallel algorithms; recommender systems; stochastic processes; ALS; CCD++ algorithm; SGD; alternating least squares; classical optimization approach; coordinate descent based methods; cubic time complexity; distributed systems; large-scale datasets; large-scale problems; multicore systems; parallel algorithms; parallel matrix factorization; recommender systems; scalable coordinate descent; slow convergence; stable convergence; stochastic gradient descent; synthetic dataset; update rule; update sequences; updates rank-one factors; web-scale datasets; Acceleration; Charge coupled devices; Convergence; Least squares approximation; Recommender systems; Time complexity; Low rank approximation; Matrix factorization; Parallelization; Recommender systems;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining (ICDM), 2012 IEEE 12th International Conference on

Conference_Location

Brussels

ISSN

1550-4786

Print_ISBN

978-1-4673-4649-8

Type

conf

DOI

10.1109/ICDM.2012.168

Filename

6413853