مرکز منطقه ای اطلاع رساني علوم و فناوري - Parallel and Synchronized UCB2 for Online Recommendation Systems

Abstract :

As users´ preferences shift continuously, recommendation system has to learn quickly from them. It is an interesting online learning problem as recommender does not have any prior knowledge about the distribution of items over the users. In this work, we generate a small recommendation set from a large number of items, with an intention that at least one of recommended items would satisfy the user and thus minimize user abandonment. We used multi-armed bandit algorithm for this purpose and avail multiple instances of Upper Confidence Bound2 (UCB2). Although UCB2 is theoretically proved to have a better regret bound than UCB1, unlike UCB1, it has not been used for parallel execution. We designed an efficient algorithm which runs multiple instances of UCB2 in parallel. Our algorithm suitably handles parameter synchronization, reward update and exploration decisions across multiple instances of UCB2 and ensures that they are capable of covering different types of users. While applied to real data, our method shows comparable performance over a recommendation system that runs multiple instances of UCB1 in parallel. We compared our results with Ranked Bandit Algorithm and Independent Bandit Algorithm.