• DocumentCode
    468231
  • Title

    An Application of Improved PageRank in Focused Crawler

  • Author

    Zhang, Yulian ; Yin, Chunxia ; Yuan, Fuyong

  • Author_Institution
    Yanshan Univ., Qinhuangdao
  • Volume
    2
  • fYear
    2007
  • fDate
    24-27 Aug. 2007
  • Firstpage
    331
  • Lastpage
    335
  • Abstract
    The focused crawler of a special-purpose search engine aims to selectively seek out pages that are relevant to a pre-defined set of topics, rather than to exploit all regions of the Web. The PageRank algorithm is often used in ranking web pages, and it is also used in URL ordering for focused crawler. It estimates the page\´s authority by taking into account the link structure of the Web. However, it assigns each outlink the same weight and is independent of topics, resulting in topic-drift. In this paper, we propose an improved PageRank algorithm, which we called "To-PageRank", and then we present a crawling strategy using the To-PageRank algorithm combining with the topic similarity of the hyperlink metadata. The experiment in focused crawler shows that the new improved crawling strategy has better performance than the Breath-first and PageRank algorithms.
  • Keywords
    Internet; meta data; search engines; PageRank algorithm; URL ordering; Web pages ranking; focused crawler; hyperlink metadata; special-purpose search engine; Couplings; Crawlers; Educational institutions; Explosives; Information science; Power engineering and energy; Search engines; Uniform resource locators; Web pages; Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2007. FSKD 2007. Fourth International Conference on
  • Conference_Location
    Haikou
  • Print_ISBN
    978-0-7695-2874-8
  • Type

    conf

  • DOI
    10.1109/FSKD.2007.142
  • Filename
    4406097