DocumentCode
496878
Title
Automatic Data Records Extraction from List Page in Deep Web Sources
Author
Hong-ping, Chen ; Wei, Fang ; Zhou, Yang ; Lin, Zhuo ; Zhi-Ming, Cui
Author_Institution
Inst. of Intell. Inf. Process. & Applic., Soochow Univ., Suzhou, China
Volume
1
fYear
2009
fDate
18-19 July 2009
Firstpage
370
Lastpage
373
Abstract
With the explosive growth and popularity of the World Wide Web, a wealth of online e-commerce information resources becomes available. List pages in these Web sites are usually automatically generated from the back-end DBMS using scripts. In order to provide value-added services and convenience for users, it is very necessary to integrate Web sources of the same domain. Given the huge number of these Web pages, it is difficult and even impossible to use a manual approach to extract data records from these list pages on a large scale. According to characteristics of the template-based list pages, in this paper, we propose a LBDRF algorithm to solve the problem of automatic data records extraction from Web pages in deep Web. Our experimental results show that the proposed method performs well.
Keywords
Web services; Web sites; data mining; database management systems; electronic commerce; hypermedia markup languages; information retrieval; search engines; DOM tree model; LBDRF; Web sites; World Wide Web; back-end DBMS; data mining; data record extraction; deep Web source; document object model; layout-based data region finding; list page; online e-commerce information resource; value-added service; Books; Clustering algorithms; Data mining; Explosives; Information processing; Information resources; Large-scale systems; Search engines; Web pages; Web sites; Data Extraction; Data record; Deep Web;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Processing, 2009. APCIP 2009. Asia-Pacific Conference on
Conference_Location
Shenzhen
Print_ISBN
978-0-7695-3699-6
Type
conf
DOI
10.1109/APCIP.2009.100
Filename
5197073
Link To Document