DocumentCode
2192293
Title
A Focused Crawler Based on Naive Bayes Classifier
Author
Wang, Wenxian ; Chen, Xingshu ; Zou, Yongbin ; Wang, Haizhou ; Dai, Zongkun
Author_Institution
Network & Trusted Comput. Inst., Sichuan Univ., Chengdu, China
fYear
2010
fDate
2-4 April 2010
Firstpage
517
Lastpage
521
Abstract
The exponential growth of information on the World Wide Web makes it increasingly difficult to discover relevant data about a specific topic. In this case, growing interest is emerging in focused crawler, a program that traverses the Internet by choosing relevant pages to a predefined topic and neglecting those out of concern. A new focused crawler based on Naive Bayes classifier was proposed here, which used an improved TF-IDF algorithm to extract the characteristics of page content and adopted Bayes classifier to compute the page rank. Then the crawler developed was compared with a BFS crawler and a PageRank crawler, and the results show that our crawler has better performance than the PageRank crawler and BFS crawler in harvest ratio.
Keywords
Bayes methods; Internet; search engines; Internet; TF-IDF algorithm; World Wide Web; exponential growth; focused crawler; naive Bayes classifier; Crawlers; Information analysis; Information security; Internet; Search engines; Taxonomy; Uniform resource locators; Web pages; Web sites; World Wide Web; Classifier; Focused Crawler; Naive Bayes; TF-IDF;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on
Conference_Location
Jinggangshan
Print_ISBN
978-1-4244-6730-3
Electronic_ISBN
978-1-4244-6743-3
Type
conf
DOI
10.1109/IITSI.2010.30
Filename
5453607
Link To Document