DocumentCode :
2861665
Title :
Detecting and Partitioning Data Objects in Complex Web Pages
Author :
Ye, Shiren ; Chua, Tat-Seng
Author_Institution :
National University of Singapore
fYear :
2004
fDate :
20-24 Sept. 2004
Firstpage :
669
Lastpage :
672
Abstract :
This paper presents an automated approach to detect and partition data objects or product description from complex Web pages. First, we derive the common page structure by comparing similar pages, and then identify data region covering the descriptions of data objects. Second, we partition the nodes belonging to different data objects in the data region and construct the self-explainable XML output files. The experiments indicate that our technique is effective.
Keywords :
Business; Cleaning; Companies; Data mining; Navigation; Object detection; Organizing; Partitioning algorithms; Web pages; XML;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2004. WI 2004. Proceedings. IEEE/WIC/ACM International Conference on
Print_ISBN :
0-7695-2100-2
Type :
conf
DOI :
10.1109/WI.2004.10155
Filename :
1410893
Link To Document :
بازگشت