DocumentCode
2861665
Title
Detecting and Partitioning Data Objects in Complex Web Pages
Author
Ye, Shiren ; Chua, Tat-Seng
Author_Institution
National University of Singapore
fYear
2004
fDate
20-24 Sept. 2004
Firstpage
669
Lastpage
672
Abstract
This paper presents an automated approach to detect and partition data objects or product description from complex Web pages. First, we derive the common page structure by comparing similar pages, and then identify data region covering the descriptions of data objects. Second, we partition the nodes belonging to different data objects in the data region and construct the self-explainable XML output files. The experiments indicate that our technique is effective.
Keywords
Business; Cleaning; Companies; Data mining; Navigation; Object detection; Organizing; Partitioning algorithms; Web pages; XML;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN
0-7695-2100-2
Type
conf
DOI
10.1109/WI.2004.10155
Filename
1410893
Link To Document