• DocumentCode
    3230585
  • Title

    A Method for Focused Crawling Using Combination of Link Structure and Content Similarity

  • Author

    Jamali, Mohsen ; Sayyadi, Hassan ; Hariri, Babak Bagheri ; Abolhassani, Hassan

  • Author_Institution
    Comput. Eng. Dept., Sharif Univ. of Technol., Tehran
  • fYear
    2006
  • fDate
    18-22 Dec. 2006
  • Firstpage
    753
  • Lastpage
    756
  • Abstract
    The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines, A focused crawler aims at selectively seek out pages that are relevant to a pre-defined set of topics. Besides specifying topics by some keywords, it is customary also to use some exemplary documents to compute the similarity of a given Web document to the topic, in this paper we introduce a new hybride focused crawler, which uses link structure of documents as well as similarity of pages to the topic to crawl the Web
  • Keywords
    Internet; document handling; information retrieval; search engines; Web document link structure; Web document similarity; World-Wide Web; content similarity; focused Web crawler; focused Web crawling method; search engine; Crawlers; Feedback; Intelligent structures; Laboratories; Marine animals; Motorcycles; Ontologies; Search engines; Uniform resource locators; Web sites;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    0-7695-2747-7
  • Type

    conf

  • DOI
    10.1109/WI.2006.19
  • Filename
    4061466