• DocumentCode
    3006526
  • Title

    Research on Web Information Extraction Based on XML

  • Author

    Hu, Yan ; Xuan, Yanyan

  • Author_Institution
    Dept. Comput. Sci. & Technol., Wuhan Univ. of Technol., Wuhan
  • fYear
    2008
  • fDate
    25-26 Sept. 2008
  • Firstpage
    201
  • Lastpage
    204
  • Abstract
    The standard XML technology is used for Web information extraction in this paper, and a generic XML-based Web information extraction solution is proposed. In the extraction process, two key technologies are proposed and implemented: the XML-based Web data conversion technology and the DOM-based XPath generation technology, to simplify the information extraction work. XSLT is used as the description language of extraction rules, which is conductive to the unity of extraction patterns.
  • Keywords
    Internet; XML; information retrieval; DOM-based XPath generation technology; Web information extraction; XML technology; XML-based Web data conversion technology; XSLT description language; extraction rule pattern; Artificial intelligence; Computer science; Data conversion; Data mining; Genetics; HTML; Memory; Optimization methods; Web pages; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Genetic and Evolutionary Computing, 2008. WGEC '08. Second International Conference on
  • Conference_Location
    Hubei
  • Print_ISBN
    978-0-7695-3334-6
  • Type

    conf

  • DOI
    10.1109/WGEC.2008.16
  • Filename
    4637427