• DocumentCode
    1987987
  • Title

    Information mining system design and implementation based on web crawler

  • Author

    Lin, Shan ; Li, You-meng ; Li, Qing-Cheng

  • Author_Institution
    Coll. of Inf. Tech. Sci., Nankai Univ., Tianjin
  • fYear
    2008
  • fDate
    2-4 June 2008
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    With the information explosion causing by the World Wide Web in recent years, the issue of how to execute the enormous information efficiently at a reasonable lost has become the concern of information providers, service agencies and end users. When many research focus on how to design an efficient Web crawler, we pay our attention to how to make the best of the result of Web crawler. In this paper, we describe the design and implementation of an information mining system running on the results of Web crawler to gain more metadata from unstructured documents for focused search (such as RSS search). We present the software architecture of the system, describe efficient techniques for achieving high performance and report preliminary experimental results to prove that this system can address the issue of robustness, flexibility and accuracy at a low cost.
  • Keywords
    Internet; data mining; document handling; information retrieval; meta data; software architecture; Web crawler; World Wide Web; information mining system; information provider; metadata; service agency; software architecture; Costs; Crawlers; Data mining; Educational institutions; Electronic mail; Fuzzy logic; Internet; Search engines; Web pages; Web sites; Crawler; RSS; information mining; low cost;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    System of Systems Engineering, 2008. SoSE '08. IEEE International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-1-4244-2172-5
  • Electronic_ISBN
    978-1-4244-2173-2
  • Type

    conf

  • DOI
    10.1109/SYSOSE.2008.4724148
  • Filename
    4724148