• DocumentCode
    2259119
  • Title

    Design and Implementation of a Web Information Extraction System Based on  R-G-B Algorithm

  • Author

    Li, Yaoguo ; Sun, Huiye ; Lin, Shan ; Zhu, Mingying

  • Author_Institution
    Collage of Software, Nankai Univ., Tianjin
  • Volume
    1
  • fYear
    2008
  • fDate
    20-22 Dec. 2008
  • Firstpage
    254
  • Lastpage
    258
  • Abstract
    With the enormous growth of the World Wide Web in recent years, the issue of how to extract information from web pages efficiently, accurately and flexibly has become an important challenge for web crawler designers. Different from many other approaches, "R-G-B" algorithm is a new algorithm, which can well meet the requirement of search engines on the accuracy and the efficiency of information extraction. In this paper, we describe the design and implementation of a web information extraction system module which is based on the algorithm. We present the architecture of the system and report preliminary experimental results to prove that the system can address the issue of robustness, flexibility and accuracy at a low cost.
  • Keywords
    Web sites; information retrieval; search engines; R-G-B algorithm; Web information extraction system; Web pages; World Wide Web; search engines; Algorithm design and analysis; Costs; Crawlers; Data mining; Hidden Markov models; Robustness; Search engines; Web mining; Web pages; Web sites; Design and Implementation; Information Extraction; R-G-B Algorithm; Web Crawler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Technology Application, 2008. IITA '08. Second International Symposium on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-0-7695-3497-8
  • Type

    conf

  • DOI
    10.1109/IITA.2008.388
  • Filename
    4739574