مرکز منطقه ای اطلاع رساني علوم و فناوري - Web page repetitive structure and URL feature based Deep Web data extraction

DocumentCode :

2363970

Title :

Web page repetitive structure and URL feature based Deep Web data extraction

Author :

Li, Xingyi ; Kong, Yanyan ; Shi, Huaji

Author_Institution :

Sch. of Comput. Sci. & Telecommun. Eng., Jiangsu Univ., Zhenjiang, China

Volume :

fYear :

2010

fDate :

June 29 2010-July 1 2010

Firstpage :

361

Lastpage :

364

Abstract :

Noise interference in web pages and the demand for multiple sample pages are the key issues of Deep Web data extraction. In this paper, we propose a novel web page repetitive structure and URL feature based approach for Deep Web data extraction. It employs continuous repetitive tag region and similar URL to partition the sample page into blocks, locate the data region and extract specific URL template, which is further exploited to quickly identify the data region and the boundary of data records in similar pages. Experimental results show that our approach is highly effective for Deep Web data extraction.

Keywords :

Internet; Web sites; information retrieval; Deep Web data extraction; URL feature; Web page repetitive structure; noise interference; Accuracy; Data mining; Educational institutions; Feature extraction; Deep Web; data extraction; similar URL; web page repetitive structure;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Communication Systems, Networks and Applications (ICCSNA), 2010 Second International Conference on

Conference_Location :

Hong Kong

Print_ISBN :

978-1-4244-7475-2

Type :

conf

DOI :

10.1109/ICCSNA.2010.5588744

Filename :

5588744

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2363970