• DocumentCode
    189289
  • Title

    Comparison of Scheduling Algorithms for Domain Specific Web Crawler

  • Author

    Filipowski, Krzysztof

  • Author_Institution
    Dept. of Comput. Syst. & Networks, Wroclaw Univ. of Technol., Wroclaw, Poland
  • fYear
    2014
  • fDate
    29-30 Sept. 2014
  • Firstpage
    69
  • Lastpage
    74
  • Abstract
    Domain-specific Web crawlers are effective tools for acquiring information from the Web. One of the most crucial factors influencing the efficiency of domain crawlers is choice of crawling strategy. This article describes and compares several strategies for domain specific Web crawling. It concentrates particularly on scheduling algorithms which determine order of crawling URLs collected by the crawler. The objective of these strategies is to download the most relevant Web pages in an early stage of the crawl. In the paper there are presented four different algorithms which are compared using several metrics.
  • Keywords
    Internet; Web sites; information retrieval; scheduling; Web pages; domain specific Web crawler; information retrieval; scheduling algorithms; Algorithm design and analysis; Crawlers; Internet; Search engines; Search problems; Uniform resource locators; Web pages; Best N-First Search; Best-First Search; Domain Specific Crawling; Exploration; Information Retrieval;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Network Intelligence Conference (ENIC), 2014 European
  • Conference_Location
    Wroclaw
  • Type

    conf

  • DOI
    10.1109/ENIC.2014.14
  • Filename
    6984893