• DocumentCode
    3213432
  • Title

    Building intelligent systems for mining information extraction rules from web pages by using domain knowledge

  • Author

    Seo, Heekyoung ; Yang, Jaeyoung ; Choi, Joongmin

  • Author_Institution
    HC1 Lab., Samsung Adv. Inst. of Technol., South Korea
  • Volume
    1
  • fYear
    2001
  • fDate
    2001
  • Firstpage
    322
  • Abstract
    Previous research on automatic information extraction experienced difficulties in acquiting and representing useful domain knowledge and in coping with the structural heterogeneity among different information sources. As a result, many real-world information sources with complex document structures could not be correctly analyzed. In order to resolve these problems, this paper presents a method of building intelligent systems for mining information extraction rules from semi-structured Web pages by using domain knowledge. This system automatically generates a wrapper for each information source and performs information extraction and information integration by applying this wrapper to the corresponding source. Both the domain knowledge and the wrapper are represented by ML documents to increase flexibility and interoperability. By testing our prototype system on several real-estate information sites, we can claim that it creates the correct wrappers for most Web sources and consequently facilitates effective information extraction for heterogeneous information sources
  • Keywords
    artificial intelligence; data mining; information resources; information retrieval; knowledge based systems; open systems; ML documents; automatic information extraction; complex document structures; domain knowledge; heterogeneous information sources; information extraction; information integration; information source; intelligent systems; interoperability; mining information extraction rules; real-estate information sites; real-world information sources; semi-structured Web pages; structural heterogeneity; web pages; Computer science; Data mining; HTML; Human computer interaction; Information analysis; Intelligent structures; Intelligent systems; Knowledge engineering; Web pages; XML;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial Electronics, 2001. Proceedings. ISIE 2001. IEEE International Symposium on
  • Conference_Location
    Pusan
  • Print_ISBN
    0-7803-7090-2
  • Type

    conf

  • DOI
    10.1109/ISIE.2001.931807
  • Filename
    931807