DocumentCode :
3230585
Title :
A Method for Focused Crawling Using Combination of Link Structure and Content Similarity
Author :
Jamali, Mohsen ; Sayyadi, Hassan ; Hariri, Babak Bagheri ; Abolhassani, Hassan
Author_Institution :
Comput. Eng. Dept., Sharif Univ. of Technol., Tehran
fYear :
2006
fDate :
18-22 Dec. 2006
Firstpage :
753
Lastpage :
756
Abstract :
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines, A focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Besides specifying topics by some keywords, it is customary also to use some exemplary documents to compute the similarity of a given Web document to the topic, in this paper we introduce a new hybride focused crawler, which uses link structure of documents as well as similarity of pages to the topic to crawl the Web
Keywords :
Internet; document handling; information retrieval; search engines; Web document link structure; Web document similarity; World-Wide Web; content similarity; focused Web crawler; focused Web crawling method; search engine; Crawlers; Feedback; Intelligent structures; Laboratories; Marine animals; Motorcycles; Ontologies; Search engines; Uniform resource locators; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2747-7
Type :
conf
DOI :
10.1109/WI.2006.19
Filename :
4061466
Link To Document :
بازگشت