Title :
A Method for Focused Crawling Using Combination of Link Structure and Content Similarity
Author :
Jamali, Mohsen ; Sayyadi, Hassan ; Hariri, Babak Bagheri ; Abolhassani, Hassan
Author_Institution :
Comput. Eng. Dept., Sharif Univ. of Technol., Tehran
Abstract :
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines, A focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Besides specifying topics by some keywords, it is customary also to use some exemplary documents to compute the similarity of a given Web document to the topic, in this paper we introduce a new hybride focused crawler, which uses link structure of documents as well as similarity of pages to the topic to crawl the Web
Keywords :
Internet; document handling; information retrieval; search engines; Web document link structure; Web document similarity; World-Wide Web; content similarity; focused Web crawler; focused Web crawling method; search engine; Crawlers; Feedback; Intelligent structures; Laboratories; Marine animals; Motorcycles; Ontologies; Search engines; Uniform resource locators; Web sites;
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7