Title :
EaSd: A System for Extracting and Annotating Structured Data
Author :
Zhang, Huibin ; Yuan, Xiaojie ; Yang, Zongyun ; Wen, Yanlong
Author_Institution :
Coll. of Inf. Tech. Sci., Nankai Univ., Tianjin, China
Abstract :
Many Web pages are generated dynamically in response to an online query. Structured data are contained in those pages and will be useful for information integration. In this paper, we propose a system, EaSd, to automatically extract data records from those Web pages and annotate the record attributes. Using the VIPS as the data representation format of the Web pages, we deal with those two problems in a uniform process based on the query instance. For data extraction, the VIPS is a better way for Web page representation than tag-tree and makes the extraction result better correspond. EaSd annotates the record attributes with integrated interface schema and has a more consistent and complete annotation result. Also, the experimental results we got show the promise of our approach.
Keywords :
Internet; data analysis; data structures; database management systems; feature extraction; query processing; EaSd; VIPS; Web database; Web page representation; data representation format; information integration; integrated interface schema; online query; query instance; structured data annotation; structured data extraction; vision-based page segmentation; Data mining; Educational institutions; Filling; HTML; Intelligent structures; Intelligent systems; Magnetic heads; Relational databases; Tail; Web pages; Deep Web; data annotation; data extration;
Conference_Titel :
Intelligent Systems, 2009. GCIS '09. WRI Global Congress on
Conference_Location :
Xiamen
Print_ISBN :
978-0-7695-3571-5
DOI :
10.1109/GCIS.2009.81