DocumentCode :
659455
Title :
Parallel matrix factorization for binary response
Author :
Khanna, Rahul ; Liang Zhang ; Agarwal, Deborah ; Bee-chung Chen
fYear :
2013
fDate :
6-9 Oct. 2013
Firstpage :
430
Lastpage :
438
Abstract :
Predicting user affinity to items is an important problem in applications like content optimization, computational advertising, among others. While matrix factorization methods provide state-of-the-art performance when minimizing RMSE through a Gaussian response model on explicit ratings data, applying it to imbalanced binary response data presents additional challenges that we carefully study in this paper. Data in many applications usually consist of users´ implicit response that is binary - clicking an item or not; the goal is to predict click rates (i.e., probabilities), which are often combined with other measures of utilities to rank items at runtime. Because of the implicit nature, such data is usually much larger than explicit rating data but often has an imbalanced distribution with a small fraction of click events, making accurate click rate prediction difficult. In this paper, we address two problems. First, we show previous techniques to estimate factor models with binary data are less accurate compared to our new approach based on adaptive rejection sampling, especially for imbalanced response. Second, we develop a parallel matrix factorization framework using Map-Reduce that scales to massive datasets. Our parallel algorithm is based on a “divide and conquer” strategy coupled with an ensemble approach. Through experiments on two benchmark data sets and a large Yahoo! Front Page Today Module data set that contains 8M users and 1B binary observations, we show that careful handling of binary response is needed to achieve good performance for click rate prediction, and that the proposed adaptive rejection sampler and the partitioning and ensemble techniques significantly improve performance.
Keywords :
Gaussian processes; learning (artificial intelligence); matrix decomposition; mean square error methods; parallel programming; recommender systems; Gaussian response model; MapReduce; RMSE minimization; Yahoo Front Page Today Module data set; benchmark data sets; binary response; click rate prediction; click rates prediction; computational advertising; content optimization; ensemble approach; explicit rating data; explicit ratings data; imbalanced binary response data; implicit rating data; matrix factorization methods; parallel matrix factorization; root mean square error methods; user affinity prediction; Adaptation models; Collaboration; Computational modeling; Data models; Logistics; Partitioning algorithms; Vectors;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Big Data, 2013 IEEE International Conference on
Conference_Location :
Silicon Valley, CA
Type :
conf
DOI :
10.1109/BigData.2013.6691604
Filename :
6691604
Link To Document :
بازگشت