• DocumentCode
    553192
  • Title

    Construct the XQuery-based wrapper for extracting web data

  • Author

    Tiezheng Nie ; Derong Shen ; Ge Yu ; Yue Kou ; Dan Yang

  • Author_Institution
    Coll. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
  • Volume
    3
  • fYear
    2011
  • fDate
    26-28 July 2011
  • Firstpage
    1788
  • Lastpage
    1792
  • Abstract
    Web pages provide a large number of structured data, which are required by many advanced applications. However, existing works lack the compatibility. This paper proposes a web data extraction model which builds an XQuery-based wrapper for extracting data of web pages. We firstly annotate data values with XPATH in XML documents of sample pages. Then we design an algorithm to generate XQuery statements which can extract data form XML documents and output result data with structured or semi-structured format. Since XQuery is a well known standard for operating XML data and is supported by most database systems and applications, our wrapper has high compatibility for most applications. The experimental results demonstrated approach we proposed is feasible for extracting web data which is important for web data integration.
  • Keywords
    Web services; XML; data integrity; data structures; document handling; query processing; Web data extraction; Web data integration; Web pages; XML documents; XPATH; XQuery-based wrapper; data structure; semi-structured format; Accuracy; Data mining; Data models; Noise measurement; Web pages; XML; XQuery; data extraction; wrapper;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2011 Eighth International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-61284-180-9
  • Type

    conf

  • DOI
    10.1109/FSKD.2011.6019852
  • Filename
    6019852