• DocumentCode
    2252120
  • Title

    An Analysis of URLs Generated from JavaScript Code

  • Author

    Zhou, Jingyu ; Ding, Yu

  • Author_Institution
    Sch. of Software, Shanghai Jiao Tong Univ., Shanghai, China
  • fYear
    2012
  • fDate
    May 30 2012-June 1 2012
  • Firstpage
    688
  • Lastpage
    693
  • Abstract
    Search engines use a crawling system to recursively download web pages, analyze HTML pages, and generate a new list of URLs to crawl. As web pages are becoming more dynamic than before, JavaScript is heavily used, which poses a great challenge for the crawling system, because now many URLs are embedded in the JavaScript code and are invisible to the crawler. Worse, there is no study on the usage patterns of these URLs and the impact of JavaScript-generated URLs is unknown. We propose a browser emulation method to study the usage of URLs from JavaScript code. In order to find these URLs, we instrument a browser core to output all URLs inside a web page, including those generated from JavaScript. Then we classify these URLs into a number of types and study reasons that web developers put them in JavaScript. We analyze top Internet sites and popular web pages. The results show that more than half of them contain URLs generated from JavaScript, which accounts for about 6-19% of total URLs. Among them, 26-41% refer to potential important contents that should be indexed by search engine crawlers, and advertising URLs are about 26-35%.
  • Keywords
    Internet; Java; Web sites; hypermedia markup languages; search engines; HTML pages; Internet sites; JavaScript code; URL analysis; Web developers; Web pages; browser emulation method; crawling system; search engines; Browsers; Electronic mail; Engines; Google; HTML; Navigation; Web pages; JavaScript; URL; browser emulation; web crawler;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4673-1536-4
  • Type

    conf

  • DOI
    10.1109/ICIS.2012.28
  • Filename
    6211134