مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient extraction of news articles based on RSS crawling

DocumentCode :

3277024

Title :

Efficient extraction of news articles based on RSS crawling

Author :

Adam, George ; Bouras, Christos ; Poulopoulos, Vassilis

Author_Institution :

Comput. & Inf. Eng. Dept., Univ. of Patras, Patras, Greece

fYear :

2010

fDate :

3-5 Oct. 2010

Firstpage :

Lastpage :

Abstract :

The expansion of the World Wide Web has led to a state where a vast amount of Internet users face and have to overcome the major problem of discovering desired information. It is inevitable that hundreds of web pages and weblogs are generated daily or changing on a daily basis. The main problem that arises from the continuous generation and alteration of web pages is the discovery of useful information, a task that becomes difficult even for the experienced internet users. Many mechanisms have been constructed and presented in order to overcome the puzzle of information discovery on the Internet and they are mostly based on crawlers which are browsing the WWW, downloading pages and collect the information that might be of user interest. In this manuscript we describe a mechanism that fetches web pages that include news articles from major news portals and blogs. This mechanism is constructed in order to support tools that are used to acquire news articles from all over the world, process them and present them back to the end users in a personalized manner.

Keywords :

Internet; information retrieval; Internet users; RSS crawling; Web pages; Weblogs; World Wide Web; news articles extraction; Computers; Crawlers; Databases; Feeds; History; Training; Web pages; crawling; extraction; feeds; news; rss;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine and Web Intelligence (ICMWI), 2010 International Conference on

Conference_Location :

Algiers

Print_ISBN :

978-1-4244-8608-3

Type :

conf

DOI :

10.1109/ICMWI.2010.5647851

Filename :

5647851

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3277024