DocumentCode :
3277024
Title :
Efficient extraction of news articles based on RSS crawling
Author :
Adam, George ; Bouras, Christos ; Poulopoulos, Vassilis
Author_Institution :
Comput. & Inf. Eng. Dept., Univ. of Patras, Patras, Greece
fYear :
2010
fDate :
3-5 Oct. 2010
Firstpage :
1
Lastpage :
7
Abstract :
The expansion of the World Wide Web has led to a state where a vast amount of Internet users face and have to overcome the major problem of discovering desired information. It is inevitable that hundreds of web pages and weblogs are generated daily or changing on a daily basis. The main problem that arises from the continuous generation and alteration of web pages is the discovery of useful information, a task that becomes difficult even for the experienced internet users. Many mechanisms have been constructed and presented in order to overcome the puzzle of information discovery on the Internet and they are mostly based on crawlers which are browsing the WWW, downloading pages and collect the information that might be of user interest. In this manuscript we describe a mechanism that fetches web pages that include news articles from major news portals and blogs. This mechanism is constructed in order to support tools that are used to acquire news articles from all over the world, process them and present them back to the end users in a personalized manner.
Keywords :
Internet; information retrieval; Internet users; RSS crawling; Web pages; Weblogs; World Wide Web; news articles extraction; Computers; Crawlers; Databases; Feeds; History; Training; Web pages; crawling; extraction; feeds; news; rss;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine and Web Intelligence (ICMWI), 2010 International Conference on
Conference_Location :
Algiers
Print_ISBN :
978-1-4244-8608-3
Type :
conf
DOI :
10.1109/ICMWI.2010.5647851
Filename :
5647851
Link To Document :
بازگشت