DocumentCode
553192
Title
Construct the XQuery-based wrapper for extracting web data
Author
Tiezheng Nie ; Derong Shen ; Ge Yu ; Yue Kou ; Dan Yang
Author_Institution
Coll. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
Volume
3
fYear
2011
fDate
26-28 July 2011
Firstpage
1788
Lastpage
1792
Abstract
Web pages provide a large number of structured data, which are required by many advanced applications. However, existing works lack the compatibility. This paper proposes a web data extraction model which builds an XQuery-based wrapper for extracting data of web pages. We firstly annotate data values with XPATH in XML documents of sample pages. Then we design an algorithm to generate XQuery statements which can extract data form XML documents and output result data with structured or semi-structured format. Since XQuery is a well known standard for operating XML data and is supported by most database systems and applications, our wrapper has high compatibility for most applications. The experimental results demonstrated approach we proposed is feasible for extracting web data which is important for web data integration.
Keywords
Web services; XML; data integrity; data structures; document handling; query processing; Web data extraction; Web data integration; Web pages; XML documents; XPATH; XQuery-based wrapper; data structure; semi-structured format; Accuracy; Data mining; Data models; Noise measurement; Web pages; XML; XQuery; data extraction; wrapper;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
Conference_Location
Shanghai
Print_ISBN
978-1-61284-180-9
Type
conf
DOI
10.1109/FSKD.2011.6019852
Filename
6019852
Link To Document