• DocumentCode
    2113365
  • Title

    A Semantic DOM Approach for Webpage Information Extraction

  • Author

    Fei, Yulian ; Luo, Zongwei ; Xu, Yun ; Zhang, Winston

  • Author_Institution
    Comput. Sci. & Inf. Eng. Inst., Zhejiang Gongshang Univ., Hangzhou, China
  • fYear
    2009
  • fDate
    20-22 Sept. 2009
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    With the development of electronic technology and e-commerce, technology for Web pages has attracted a lot of research efforts which becomes one of the hottest topics recently. This paper has proposed a semantic DOM (SDOM) approach for information extraction of e-commerce Web pages. With the combination of content and structure information, the precision and recall can achieve a good result which is shown in our experiments on listpage and tablepage data sets.
  • Keywords
    Web sites; electronic commerce; Web page information extraction; document object model; e-commerce; listpage dataset; semantic DOM approach; tablepage data set; Computer science; Data mining; Facebook; HTML; Information services; Internet; Machine learning; Tree data structures; Web pages; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Management and Service Science, 2009. MASS '09. International Conference on
  • Conference_Location
    Wuhan
  • Print_ISBN
    978-1-4244-4638-4
  • Electronic_ISBN
    978-1-4244-4639-1
  • Type

    conf

  • DOI
    10.1109/ICMSS.2009.5302541
  • Filename
    5302541