• DocumentCode
    3300672
  • Title

    Information Extraction from Semi-structured WEB Page Based on DOM Tree and its Application in Scientific Literature Statistical Analysis System

  • Author

    Li Weidong ; Dong Yibing ; Wang Ruijiang ; Tian Hongxia

  • Author_Institution
    Sch. of Inf. Technol., Hebei Univ. of Econ. & Bus., Shijiazhuang, China
  • fYear
    2009
  • fDate
    11-12 July 2009
  • Firstpage
    124
  • Lastpage
    127
  • Abstract
    To extract information automatically from semi-structured Web pages, this paper puts forward a method named IESS for discovering the record model based on DOM and maximal similar sub tree, to identify records automatically and correctly when there are some differences in expression models of records that belong to the same type. To test the performance of the method, a scientific literature statistical analysis system is designed. The practice shows that users can quickly understand the distribution of papers in their retrieving field and grasp the importance with the help of the system.
  • Keywords
    Web sites; information retrieval; statistical analysis; tree data structures; DOM tree; IESS method; information extraction; maximal similar sub tree; scientific literature statistical analysis system; semistructured Web page; Application software; Conference management; Data mining; Databases; Engineering management; HTML; Information management; Statistical analysis; Technology management; Web pages; Automatic information extraction; DOM; Scientific Literature; Statistical Analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Services Science, Management and Engineering, 2009. SSME '09. IITA International Conference on
  • Conference_Location
    Zhangjiajie
  • Print_ISBN
    978-0-7695-3729-0
  • Type

    conf

  • DOI
    10.1109/SSME.2009.59
  • Filename
    5233332