Title :
Cloud Press: A next generation news retrieval system on the cloud
Author :
Raj, D. Arockia Anand ; Mala, T.
Author_Institution :
Dept. of Inf. Sci. & Technol., Anna Univ., Chennai, India
Abstract :
The information available on the internet is growing at very high rate. Especially, news articles are added and updated round-the-clock. News retrieval systems which are in use today, are not very much capable of handling such huge amounts of news articles effectively and accurately. Due to the need for frequent and intensive processing, a news retrieval system needs to be scalable, robust and fault tolerant. By the use of Cloud technology, this can achieved. A news retrieval system on the cloud can be used to fetch, process, organize and also be used for faster and accurate retrieval. It can be made to operate with less supervision or none at all. Cloud Press, a next generation news retrieval system presented here, is designed and implemented, to overcome most of the pit falls of the news retrieval systems, which are in place today. It uses MapReduce paradigm for fetching, processing and organizing all the news articles in a distributed fashion. MapReduce approach allows it to split the tasks into sub-tasks and then allows them to be assigned to various nodes present in the cloud, which are then finished and consolidated to give one final output. Thus, the processing speed is increased and the processing time is reduced, greatly. Cloud Press uses various novel algorithms for parallel crawling of the web and distributed processing of the news articles. A distributed database is used for storing and indexing of news articles. The retrieval system also includes a query expansion feature for searching of news articles and a novel visualization technique is used to visualize the retrieved news articles.
Keywords :
Internet; cloud computing; data visualisation; electronic publishing; indexing; query processing; Internet; MapReduce paradigm; cloud press; cloud technology; distributed processing; news article indexing; news articles; next generation news retrieval system; parallel Web crawling; retrieved news articles; visualization technique; Crawlers; Distributed databases; Generators; Indexes; Three dimensional displays; Visualization; XML; Distributed systems; Information visualization; Parallel algorithms; Retrieval models;
Conference_Titel :
Recent Advances in Computing and Software Systems (RACSS), 2012 International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4673-0252-4
DOI :
10.1109/RACSS.2012.6212684