• DocumentCode
    3234060
  • Title

    An efficient wrapper for Web data extraction and its application

  • Author

    Zhang, Suzhi ; Shi, Peizhong

  • Author_Institution
    Coll. of Comput. & Commun. Eng., Zhengzhou Univ. of Light Ind., Zhengzhou, China
  • fYear
    2009
  • fDate
    25-28 July 2009
  • Firstpage
    1245
  • Lastpage
    1250
  • Abstract
    Web Wrapper extracts the data from the given Web sources according to the corresponding extraction rules of them. Its´ design is a key technology for Web information extraction and integration. This paper describes the design and implementation of a kind of the Web wrapper which based on pre-defined schema. Then it validates the data extraction from the new books information Web pages of some publishing companies and analyses the extraction results with this kind of Web Wrapper. We find it can accurately extract the data from the Web source. So we can conclude that this kind of Web Wrapper which proposed in this paper is feasible, efficient and maintainable. It will be applied for Web data integration based on wrapper/mediator that we rely on to develop a Web application for book information integration and query system.
  • Keywords
    Internet; data handling; query processing; Web data extraction; Web data integration; Web information extraction; Web information integration; Web sources; Web wrapper; books information Web pages; query system; Application software; Books; Computer science; Computer science education; Data engineering; Data mining; Displays; Educational institutions; HTML; Web pages; Web data integration; Wrapper; extraction rule; information extraction; new book information;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science & Education, 2009. ICCSE '09. 4th International Conference on
  • Conference_Location
    Nanning
  • Print_ISBN
    978-1-4244-3520-3
  • Electronic_ISBN
    978-1-4244-3521-0
  • Type

    conf

  • DOI
    10.1109/ICCSE.2009.5228403
  • Filename
    5228403