DocumentCode :
3362573
Title :
A Web Crawler Detection Algorithm Based on Web Page Member List
Author :
Guo, Weigang ; Zhong, Yong ; Xie, Jianqin
Author_Institution :
Sch. of Electron. & Inf. Eng., Foshan Univ. Foshan, Foshan, China
Volume :
1
fYear :
2012
fDate :
26-27 Aug. 2012
Firstpage :
189
Lastpage :
192
Abstract :
Following the widely use of search engines, the impact Web crawlers have on the Web sites should not be ignored. After analyzing the navigational patterns of Web crawlers from Web logs, a new algorithm based on Web page member list is proposed. The algorithm constructs one member list for every Web page and one show table for every visitor. The experiment shows that the new algorithm can detect the unknown crawlers and unfriendly crawlers who do not obey the Standard for Robot Exclusion.
Keywords :
Web sites; information retrieval; online front-ends; search engines; Web crawler detection algorithm; Web crawler navigational patterns; Web logs; Web page member list; Web search engines; Web sites; unfriendly crawler detection; unknown crawler detection; Browsers; Crawlers; HTML; IP networks; Servers; Web pages; Search engine; Web crawler detection; Web log; Web page member list;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2012 4th International Conference on
Conference_Location :
Nanchang, Jiangxi
Print_ISBN :
978-1-4673-1902-7
Type :
conf
DOI :
10.1109/IHMSC.2012.54
Filename :
6305658
Link To Document :
بازگشت