• DocumentCode
    2192293
  • Title

    A Focused Crawler Based on Naive Bayes Classifier

  • Author

    Wang, Wenxian ; Chen, Xingshu ; Zou, Yongbin ; Wang, Haizhou ; Dai, Zongkun

  • Author_Institution
    Network & Trusted Comput. Inst., Sichuan Univ., Chengdu, China
  • fYear
    2010
  • fDate
    2-4 April 2010
  • Firstpage
    517
  • Lastpage
    521
  • Abstract
    The exponential growth of information on the World Wide Web makes it increasingly difficult to discover relevant data about a specific topic. In this case, growing interest is emerging in focused crawler, a program that traverses the Internet by choosing relevant pages to a predefined topic and neglecting those out of concern. A new focused crawler based on Naive Bayes classifier was proposed here, which used an improved TF-IDF algorithm to extract the characteristics of page content and adopted Bayes classifier to compute the page rank. Then the crawler developed was compared with a BFS crawler and a PageRank crawler, and the results show that our crawler has better performance than the PageRank crawler and BFS crawler in harvest ratio.
  • Keywords
    Bayes methods; Internet; search engines; Internet; TF-IDF algorithm; World Wide Web; exponential growth; focused crawler; naive Bayes classifier; Crawlers; Information analysis; Information security; Internet; Search engines; Taxonomy; Uniform resource locators; Web pages; Web sites; World Wide Web; Classifier; Focused Crawler; Naive Bayes; TF-IDF;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on
  • Conference_Location
    Jinggangshan
  • Print_ISBN
    978-1-4244-6730-3
  • Electronic_ISBN
    978-1-4244-6743-3
  • Type

    conf

  • DOI
    10.1109/IITSI.2010.30
  • Filename
    5453607