• DocumentCode
    2579026
  • Title

    An Approach for Crawling Dynamic WebPages Based on Script Language Analysis

  • Author

    Yao, Zhang ; Daling, Wang ; Shi, Feng ; Yifei, Zhang ; Fangling, Leng

  • Author_Institution
    Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
  • fYear
    2012
  • fDate
    16-18 Nov. 2012
  • Firstpage
    35
  • Lastpage
    38
  • Abstract
    Traditional Web crawlers use one or more URLs of the initial Webpages to extract new URLs continuously, and then access data of the pages. AJAX, as one of the core technologies of Web2.0, greatly enhances the response efficiency of Web applications, brings good user experience, and therefore has been widely used. However, due to the use of AJAX techniques shatters the architecture of traditional Web pages which is based on static pages, the traditional Web crawlers cannot meet the challenges of dynamic partial refresh and asynchronous loading. In this paper, we propose an efficient approach for the information in dynamic pages by analyzing script language, and use path repository and judge the page refreshing state to improve the accuracy and efficiency of the algorithm. Experimental evaluation shows the efficiency and effectiveness of our approach.
  • Keywords
    Internet; Java; XML; authoring languages; AJAX; URL; Web 2.0; Web crawlers; dynamic Web pages crawling; path repository; script language analysis; Algorithm design and analysis; Browsers; Crawlers; HTML; Heuristic algorithms; Servers; Switches; Web crawler; asynchronous loading; dynamic scripts analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Information Systems and Applications Conference (WISA), 2012 Ninth
  • Conference_Location
    Haikou
  • Print_ISBN
    978-1-4673-3054-1
  • Type

    conf

  • DOI
    10.1109/WISA.2012.34
  • Filename
    6385179