• DocumentCode
    2137529
  • Title

    A Comparison over Focused Web Crawling Strategies

  • Author

    Avraam, Ioannis ; Anagnostopoulos, Ioannis

  • Author_Institution
    Dept. of Math., Aristotle Univ. of Thessaloniki, Thessaloniki, Greece
  • fYear
    2011
  • fDate
    Sept. 30 2011-Oct. 2 2011
  • Firstpage
    245
  • Lastpage
    249
  • Abstract
    In this paper we review and compare focused crawling strategies, studied and published during the past decade. Despite giant leaps in communication, storage and computing power in recent years, crawlers have always struggled to keep up with Web content generation and modification. Focused crawlers attempt to i) accelerate the crawling process, ii) maximize the harvest of high quality pages, iii) assign appropriate credit to different documents along a crawling path, such that short-term gains are not pursued at the expense of less obvious paths that ultimately yield larger sets of valuable pages. Beyond the review and comparison of the focused crawling strategies, we additionally propose additions to the corresponding architectures for further research.
  • Keywords
    Internet; document handling; Web content generation; Web content modification; crawling path; focused Web crawling strategies; Collaboration; Context; Crawlers; Feature extraction; Web pages; World Wide Web; Focused crawling; adaptive crawling; context graphs; location-based Web search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Informatics (PCI), 2011 15th Panhellenic Conference on
  • Conference_Location
    Kastonia
  • Print_ISBN
    978-1-61284-962-1
  • Type

    conf

  • DOI
    10.1109/PCI.2011.53
  • Filename
    6065096