• DocumentCode
    3185910
  • Title

    Identification and characterization of crawlers through analysis of web logs

  • Author

    Algiriyage, Nilani ; Jayasena, Sanath ; Dias, Guilherme ; Perera, Amitha ; Dayananda, Kushan

  • Author_Institution
    Univ. of Moratuwa, Moratuwa, Sri Lanka
  • fYear
    2013
  • fDate
    17-20 Dec. 2013
  • Firstpage
    150
  • Lastpage
    155
  • Abstract
    Web crawlers are software programs that automatically traverse the hyperlink structure of the world-wide web in order to locate and retrieve information. In addition to crawlers from search engines, we observed many other crawlers which may gather business intelligence, confidential information or even execute attacks based on gathered information while camouflaging their identity. Therefore, it is important for a website owner to know who has crawled his site, and what they have done. In this study we have analyzed crawler patterns in web server logs, developed a methodology to identify crawlers and classified them into three categories. To evaluate our methodology we used seven test crawler scenarios. We found that approximately 53.25% of web crawler sessions were from “known” crawlers and 34.16% exhibit suspicious behavior.
  • Keywords
    Internet; Web sites; competitive intelligence; information retrieval; search engines; software engineering; Web crawlers; Web logs; World Wide Web; business intelligence; confidential information; hyperlink structure; information retrieval; search engines; software programs; Browsers; Crawlers; HTML; IP networks; Robots; Web servers; Web sites; Web Crawler Detection; Web Server Access Logs;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Industrial and Information Systems (ICIIS), 2013 8th IEEE International Conference on
  • Conference_Location
    Peradeniya
  • Print_ISBN
    978-1-4799-0908-7
  • Type

    conf

  • DOI
    10.1109/ICIInfS.2013.6731972
  • Filename
    6731972