Title :
Data Mining of Mass Storage Based on Cloud Computing
Author :
Wang, Jianzong ; Wan, Jiguang ; Liu, Zhuo ; Wang, Peng
Author_Institution :
Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
Cloud computing is an elastic computing model that the users can lease the resources from the rentable infrastructure. Cloud computing is gaining popularity due to its lower cost, high reliability and huge availability. To utilize the powerful and huge capability of cloud computing, this paper is to import it into data mining and machine learning field. As one of the most influential and open competition in machine learning area, Netflix Prize attached with mass storage had driven thousands of teams across the world to attack the problem, among which the final winner was BellKor´s Pragmatic Chaos team, who bested Netflix´s own algorithm for predicting ratings by 10%. Their solution is an ensemble of a large number of models, each of which specializes in addressing a different aspect of the data. Among such different models, k-nearest neighbors (KNN) and Restricted Boltzmann Machine (RBM) are reported to be two most important and successful models. As a result, we build two predictors based on such two model respectively with the order to testify their performance based on cloud computing platforms. The results show that KNN can achieve root mean square deviation (rmse) with 0:9468 after the Global Effect (GE) data preprocessing, which is better than the Cinematch´s performance with rmse being 0:951. The rmse for RBM algorithm is about 0:9670 on the raw dataset, which can be further improved by KNN model.
Keywords :
Boltzmann machines; cloud computing; data mining; learning (artificial intelligence); Netflix Prize; cloud computing; data mining; k-nearest neighbors; machine learning field; mass storage; restricted Boltzmann machine; root mean square deviation; Cloud Computing; Data Mining; Mass Storage;
Conference_Titel :
Grid and Cooperative Computing (GCC), 2010 9th International Conference on
Conference_Location :
Nanjing
Print_ISBN :
978-1-4244-9334-0
Electronic_ISBN :
978-0-7695-4313-0
DOI :
10.1109/GCC.2010.89