• DocumentCode
    2984419
  • Title

    Scalable Coordinate Descent Approaches to Parallel Matrix Factorization for Recommender Systems

  • Author

    Hsiang-Fu Yu ; Cho-Jui Hsieh ; Si si ; Dhillon, I.

  • Author_Institution
    Dept. of Comput. Sci., Univ. of Texas at Austin, Austin, TX, USA
  • fYear
    2012
  • fDate
    10-13 Dec. 2012
  • Firstpage
    765
  • Lastpage
    774
  • Abstract
    Matrix factorization, when the matrix has missing values, has become one of the leading techniques for recommender systems. To handle web-scale datasets with millions of users and billions of ratings, scalability becomes an important issue. Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD) are two popular approaches to compute matrix factorization. There has been a recent flurry of activity to parallelize these algorithms. However, due to the cubic time complexity in the target rank, ALS is not scalable to large-scale datasets. On the other hand, SGD conducts efficient updates but usually suffers from slow convergence that is sensitive to the parameters. Coordinate descent, a classical optimization approach, has been used for many other large-scale problems, but its application to matrix factorization for recommender systems has not been explored thoroughly. In this paper, we show that coordinate descent based methods have a more efficient update rule compared to ALS, and are faster and have more stable convergence than SGD. We study different update sequences and propose the CCD++ algorithm, which updatesrank-one factors one by one. In addition, CCD++ can be easily parallelized on both multi-core and distributed systems. We empirically show that CCD++ is much faster than ALS and SGD in both settings. As an example, on a synthetic dataset with 2 billion ratings, CCD++ is 4 times faster than both SGD and ALS using a distributed system with 20 machines.
  • Keywords
    computational complexity; convergence; gradient methods; matrix decomposition; optimisation; parallel algorithms; recommender systems; stochastic processes; ALS; CCD++ algorithm; SGD; alternating least squares; classical optimization approach; coordinate descent based methods; cubic time complexity; distributed systems; large-scale datasets; large-scale problems; multicore systems; parallel algorithms; parallel matrix factorization; recommender systems; scalable coordinate descent; slow convergence; stable convergence; stochastic gradient descent; synthetic dataset; update rule; update sequences; updates rank-one factors; web-scale datasets; Acceleration; Charge coupled devices; Convergence; Least squares approximation; Recommender systems; Time complexity; Low rank approximation; Matrix factorization; Parallelization; Recommender systems;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2012 IEEE 12th International Conference on
  • Conference_Location
    Brussels
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4673-4649-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2012.168
  • Filename
    6413853