Title :
Construct the XQuery-based wrapper for extracting web data
Author :
Tiezheng Nie ; Derong Shen ; Ge Yu ; Yue Kou ; Dan Yang
Author_Institution :
Coll. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
Abstract :
Web pages provide a large number of structured data, which are required by many advanced applications. However, existing works lack the compatibility. This paper proposes a web data extraction model which builds an XQuery-based wrapper for extracting data of web pages. We firstly annotate data values with XPATH in XML documents of sample pages. Then we design an algorithm to generate XQuery statements which can extract data form XML documents and output result data with structured or semi-structured format. Since XQuery is a well known standard for operating XML data and is supported by most database systems and applications, our wrapper has high compatibility for most applications. The experimental results demonstrated approach we proposed is feasible for extracting web data which is important for web data integration.
Keywords :
Web services; XML; data integrity; data structures; document handling; query processing; Web data extraction; Web data integration; Web pages; XML documents; XPATH; XQuery-based wrapper; data structure; semi-structured format; Accuracy; Data mining; Data models; Noise measurement; Web pages; XML; XQuery; data extraction; wrapper;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-61284-180-9
DOI :
10.1109/FSKD.2011.6019852