مرکز منطقه ای اطلاع رساني علوم و فناوري - Research on Web information extraction based on spider algorithm and DOM thinking

DocumentCode :

3108826

Title :

Research on Web information extraction based on spider algorithm and DOM thinking

Author :

Han, Xinchao ; Li, XiangDong ; Zheng, Qiusheng

Author_Institution :

ZhongYuan Univ. of Technol., Zhengzhou, China

Volume :

fYear :

2010

fDate :

18-19 Oct. 2010

Abstract :

The structure characteristics of the website is complicated, Web information structure is not fixed and not neat, so it is inefficient that the Web information is captured largely, the integration of Web information is very difficulty. Research Web information extraction technology, put forward and carry out a new method based on a spider algorithm and DOM thinking. Experimental results show that the method can extract information efficiently and accurately on the Web.

Keywords :

Web sites; information retrieval; DOM thinking; Web information extraction; Web information structure; spider algorithm; website; Accuracy; Companies; DOM tree; Web information extraction; spider algorithm; website structure;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Information Networking and Automation (ICINA), 2010 International Conference on

Conference_Location :

Kunming

Print_ISBN :

978-1-4244-8104-0

Electronic_ISBN :

978-1-4244-8106-4

Type :

conf

DOI :

10.1109/ICINA.2010.5636976

Filename :

5636976

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3108826