Title :
Detecting and Partitioning Data Objects in Complex Web Pages
Author :
Ye, Shiren ; Chua, Tat-Seng
Author_Institution :
National University of Singapore
Abstract :
This paper presents an automated approach to detect and partition data objects or product description from complex Web pages. First, we derive the common page structure by comparing similar pages, and then identify data region covering the descriptions of data objects. Second, we partition the nodes belonging to different data objects in the data region and construct the self-explainable XML output files. The experiments indicate that our technique is effective.
Keywords :
Business; Cleaning; Companies; Data mining; Navigation; Object detection; Organizing; Partitioning algorithms; Web pages; XML;
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
DOI :
10.1109/WI.2004.10155