Title :
A novel defense mechanism against web crawlers intrusion
Author :
Aghamohammadi, Alireza ; Eydgahi, Ali
Author_Institution :
Sch. of Eng. Technol., Eastern Michigan Univ., Ypsilanti, MI, USA
Abstract :
Web robots also known as crawlers or spiders are used by search engines, hackers and spammers to gather information about web pages. Timely detection and prevention of unwanted crawlers increases privacy and security of websites. In this paper, a novel method to identify web crawlers is proposed to prevent unwanted crawler to access websites. This new method suggests Five-factor identification process to detect unwanted crawlers. This work provides the pretest and posttest results along with a systematic evaluation of web pages with the proposed identification technique versus web pages without the proposed identification process. The outputs of logistic regression analysis for both treatment and control groups are provided to evaluate hypotheses and to answer the research questions. An experiment is performed with repeated measures for two groups with each group containing the same web pages. The main goal of this work was to address the challenge of identifying and preventing unwanted web crawlers by proposing a novel defense mechanism with identification process.
Keywords :
Web sites; data privacy; information retrieval; regression analysis; security of data; Web crawlers intrusion; Web page systematic evaluation; Web robots; Web site privacy; Web site security; Web spiders; defense mechanism; five-factor identification process; hackers; logistic regression analysis; search engines; spammers; Crawlers; IP networks; Internet; Robots; Search engines; Servers; Web pages; Privacy; Security; Web crawler detection; Web robot detection; World Wide Web;
Conference_Titel :
Electronics, Computer and Computation (ICECCO), 2013 International Conference on
Conference_Location :
Ankara
DOI :
10.1109/ICECCO.2013.6718280