Title :
MR-VSM: Map Reduce based vector Space Model for user profiling-an empirical study on News data
Author :
Anjali Gautam;Punam Bedi
Author_Institution :
Department of Computer Science, University of Delhi, India
Abstract :
Velocity of data generation has increased over a period of decade which is expected to further increase exponentially with the passage of time. To mine the useful nuggets of information, satisfying a large community of users it is preferred to capture the interest of the user, i.e., to create a user profile, and then filter the content according to his taste. A user may traverse through a large number of documents, requiring a user profiling technique to support the scalability of growing number of documents. This paper proposes a novel technique of user profiling - Map Reduce based Vector Space Model (MR-VSM). MR-VSM is a technique for user profiling where the user interacts with data rich in text and volume. MR-VSM implements traditional VSM to use Map Reduce, a parallel programming paradigm to increase the computational efficiency and support scalability of documents. It works by parallelizing the task of creating a term-document class of VSM by using TF-IDF to create term vector. For experimental study this paper makes use of the News dataset which is rich in text and volume and is collected from the web using RSS feeds. The proposed system creates user profile by taking into consideration the News item read by the user and creating a term vector for each news item read. Resulting user profile is set of Top-n terms. To test the computational efficiency and scalability of MR-VSM for growing number of news items read by user, MR-VSM is made to run on a cluster of Hadoop for 12,000, 24,000 and 48000 news items. VSM is also run for 1,500 news items to show the computational efficiency of the proposed approach. It is observed that for MR-VSM computational time for user profiling and scalability of news item read by the user are improved with the increase in the number of nodes in a Hadoop cluster.
Keywords :
"Scalability","Feeds","Computational efficiency","Computational modeling","Filtering","Informatics","Databases"
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI), 2015 International Conference on
Print_ISBN :
978-1-4799-8790-0
DOI :
10.1109/ICACCI.2015.7275635