• DocumentCode
    2290812
  • Title

    BotSeer: An Automated Information System for Analyzing Web Robots

  • Author

    Sun, Yang ; Councill, Isaac G. ; Giles, C. Lee

  • Author_Institution
    Coll. of Inf. Sci. & Technol., Pennsylvania State Univ., University Park, PA
  • fYear
    2008
  • fDate
    14-18 July 2008
  • Firstpage
    108
  • Lastpage
    114
  • Abstract
    Robots.txt files are vital to the Web since they are supposed to regulate what search engines can and cannot crawl. We present BotSeer, a Web-based information system and search tool that provides resources and services for researching Web robots and trends in Robot exclusion protocol deployment and adherence. BotSeer currently indexes and analyzes 2.2 million robots.txt files obtained from 13.2 million Websites, as well as a large Web server log of real-world robot behavior and related analyses. BotSeer provides three major services including robots.txt searching, robot bias analysis, and robot-generated log analysis. BotSeer serves as are source for studying the regulation and behavior of Web robots as well as a tool to inform the creation of effective robots.txt files and crawler implementations.
  • Keywords
    Internet; Web sites; information systems; search engines; BotSeer; Robot exclusion protocol; Web robots; Web server log; Websites; automated information system; search engines; Access protocols; Crawlers; Information analysis; Information systems; Robotics and automation; Robots; Search engines; Web page design; Web pages; Web server; robots exclusion protocol; robots.txt; search engine; web crawler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Engineering, 2008. ICWE '08. Eighth International Conference on
  • Conference_Location
    Yorktown Heights, NJ
  • Print_ISBN
    978-0-7695-3261-5
  • Electronic_ISBN
    978-0-7695-3261-5
  • Type

    conf

  • DOI
    10.1109/ICWE.2008.27
  • Filename
    4577874