Title :
A Focused Crawler Based on Naive Bayes Classifier
Author :
Wang, Wenxian ; Chen, Xingshu ; Zou, Yongbin ; Wang, Haizhou ; Dai, Zongkun
Author_Institution :
Network & Trusted Comput. Inst., Sichuan Univ., Chengdu, China
Abstract :
The exponential growth of information on the World Wide Web makes it increasingly difficult to discover relevant data about a specific topic. In this case, growing interest is emerging in focused crawler, a program that traverses the Internet by choosing relevant pages to a predefined topic and neglecting those out of concern. A new focused crawler based on Naive Bayes classifier was proposed here, which used an improved TF-IDF algorithm to extract the characteristics of page content and adopted Bayes classifier to compute the page rank. Then the crawler developed was compared with a BFS crawler and a PageRank crawler, and the results show that our crawler has better performance than the PageRank crawler and BFS crawler in harvest ratio.
Keywords :
Bayes methods; Internet; search engines; Internet; TF-IDF algorithm; World Wide Web; exponential growth; focused crawler; naive Bayes classifier; Crawlers; Information analysis; Information security; Internet; Search engines; Taxonomy; Uniform resource locators; Web pages; Web sites; World Wide Web; Classifier; Focused Crawler; Naive Bayes; TF-IDF;
Conference_Titel :
Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on
Conference_Location :
Jinggangshan
Print_ISBN :
978-1-4244-6730-3
Electronic_ISBN :
978-1-4244-6743-3
DOI :
10.1109/IITSI.2010.30