مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2698236

Title :

A generic Web news extraction approach

Author :

Dong, Yongquan ; Li, Qingzhong ; Yan, Zhongmin ; Ding, Yanhui

Author_Institution :

Sch. of Comput. Sci. & Technol., Shandong Univ., Jinan

fYear :

2008

fDate :

20-23 June 2008

Firstpage :

179

Lastpage :

183

Abstract :

With the development of the Internet, the Web is becoming the largest data repository ever available in the history of humankind. Major efforts have been made in order to provide efficient access to relevant information within the Web pages. Most previous works rely on the template of the Web sites. When information like news needs to be extracted from different sites, it must create a template for every site which will spend much time and huge cost. In this paper, we present a generic news extraction method to easily identify news content based on a set of combined heuristics and to exact every part of news according to a predefined schema. Experimental results indicate that our approach is effective in extracting news across Websites.

Keywords :

Internet; humanities; information retrieval; Internet; Web sites; generic Web news extraction approach; news content identification; Automation; Color; Computer science; Costs; Data mining; History; Internet; Navigation; Publishing; Web pages;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information and Automation, 2008. ICIA 2008. International Conference on

Conference_Location :

Changsha

Print_ISBN :

978-1-4244-2183-1

Electronic_ISBN :

978-1-4244-2184-8

Type :

conf

DOI :

10.1109/ICINFA.2008.4607992

Filename :

4607992

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2698236