DocumentCode
3230585
Title
A Method for Focused Crawling Using Combination of Link Structure and Content Similarity
Author
Jamali, Mohsen ; Sayyadi, Hassan ; Hariri, Babak Bagheri ; Abolhassani, Hassan
Author_Institution
Comput. Eng. Dept., Sharif Univ. of Technol., Tehran
fYear
2006
fDate
18-22 Dec. 2006
Firstpage
753
Lastpage
756
Abstract
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines, A focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Besides specifying topics by some keywords, it is customary also to use some exemplary documents to compute the similarity of a given Web document to the topic, in this paper we introduce a new hybride focused crawler, which uses link structure of documents as well as similarity of pages to the topic to crawl the Web
Keywords
Internet; document handling; information retrieval; search engines; Web document link structure; Web document similarity; World-Wide Web; content similarity; focused Web crawler; focused Web crawling method; search engine; Crawlers; Feedback; Intelligent structures; Laboratories; Marine animals; Motorcycles; Ontologies; Search engines; Uniform resource locators; Web sites;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
Conference_Location
Hong Kong
Print_ISBN
0-7695-2747-7
Type
conf
DOI
10.1109/WI.2006.19
Filename
4061466
Link To Document