• DocumentCode
    1628355
  • Title

    Extracting Objects from the Web

  • Author

    Nie, Zaiqing ; Wu, Fei ; Wen, Ji-Rong ; Ma, Wei-Ying

  • Author_Institution
    Microsoft Research Asia
  • fYear
    2006
  • Firstpage
    123
  • Lastpage
    123
  • Abstract
    Extracting and integrating object information from the Web is of great significance for Web data management. The existing Web information extraction techniques cannot provide satisfactory solution to the Web object extraction task since objects of the same type are distributed in diverse Web sources, whose structures are highly heterogeneous. In this paper, we propose a novel approach called Object-Level Information Extraction (OLIE) to extract Web objects. This approach extends a classic information extraction algorithm, Conditional Random Fields (CRF), by adding Web-specific information. The experimental results show OLIE can significantly improve the Web object extraction accuracy.
  • Keywords
    Asia; Data engineering; Data mining; Design methodology; HTML; Process design; Search engines; Spatial databases; Visual databases; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Engineering, 2006. ICDE '06. Proceedings of the 22nd International Conference on
  • Print_ISBN
    0-7695-2570-9
  • Type

    conf

  • DOI
    10.1109/ICDE.2006.69
  • Filename
    1617491