Title : 
A Focused Crawler Based on Naive Bayes Classifier
         
        
            Author : 
Wang, Wenxian ; Chen, Xingshu ; Zou, Yongbin ; Wang, Haizhou ; Dai, Zongkun
         
        
            Author_Institution : 
Network & Trusted Comput. Inst., Sichuan Univ., Chengdu, China
         
        
        
        
        
        
            Abstract : 
The exponential growth of information on the World Wide Web makes it increasingly difficult to discover relevant data about a specific topic. In this case, growing interest is emerging in focused crawler, a program that traverses the Internet by choosing relevant pages to a predefined topic and neglecting those out of concern. A new focused crawler based on Naive Bayes classifier was proposed here, which used an improved TF-IDF algorithm to extract the characteristics of page content and adopted Bayes classifier to compute the page rank. Then the crawler developed was compared with a BFS crawler and a PageRank crawler, and the results show that our crawler has better performance than the PageRank crawler and BFS crawler in harvest ratio.
         
        
            Keywords : 
Bayes methods; Internet; search engines; Internet; TF-IDF algorithm; World Wide Web; exponential growth; focused crawler; naive Bayes classifier; Crawlers; Information analysis; Information security; Internet; Search engines; Taxonomy; Uniform resource locators; Web pages; Web sites; World Wide Web; Classifier; Focused Crawler; Naive Bayes; TF-IDF;
         
        
        
        
            Conference_Titel : 
Intelligent Information Technology and Security Informatics (IITSI), 2010 Third International Symposium on
         
        
            Conference_Location : 
Jinggangshan
         
        
            Print_ISBN : 
978-1-4244-6730-3
         
        
            Electronic_ISBN : 
978-1-4244-6743-3
         
        
        
            DOI : 
10.1109/IITSI.2010.30