DocumentCode :
2252120
Title :
An Analysis of URLs Generated from JavaScript Code
Author :
Zhou, Jingyu ; Ding, Yu
Author_Institution :
Sch. of Software, Shanghai Jiao Tong Univ., Shanghai, China
fYear :
2012
fDate :
May 30 2012-June 1 2012
Firstpage :
688
Lastpage :
693
Abstract :
Search engines use a crawling system to recursively download web pages, analyze HTML pages, and generate a new list of URLs to crawl. As web pages are becoming more dynamic than before, JavaScript is heavily used, which poses a great challenge for the crawling system, because now many URLs are embedded in the JavaScript code and are invisible to the crawler. Worse, there is no study on the usage patterns of these URLs and the impact of JavaScript-generated URLs is unknown. We propose a browser emulation method to study the usage of URLs from JavaScript code. In order to find these URLs, we instrument a browser core to output all URLs inside a web page, including those generated from JavaScript. Then we classify these URLs into a number of types and study reasons that web developers put them in JavaScript. We analyze top Internet sites and popular web pages. The results show that more than half of them contain URLs generated from JavaScript, which accounts for about 6-19% of total URLs. Among them, 26-41% refer to potential important contents that should be indexed by search engine crawlers, and advertising URLs are about 26-35%.
Keywords :
Internet; Java; Web sites; hypermedia markup languages; search engines; HTML pages; Internet sites; JavaScript code; URL analysis; Web developers; Web pages; browser emulation method; crawling system; search engines; Browsers; Electronic mail; Engines; Google; HTML; Navigation; Web pages; JavaScript; URL; browser emulation; web crawler;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-1536-4
Type :
conf
DOI :
10.1109/ICIS.2012.28
Filename :
6211134
Link To Document :
بازگشت