DocumentCode :
2579026
Title :
An Approach for Crawling Dynamic WebPages Based on Script Language Analysis
Author :
Yao, Zhang ; Daling, Wang ; Shi, Feng ; Yifei, Zhang ; Fangling, Leng
Author_Institution :
Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
fYear :
2012
fDate :
16-18 Nov. 2012
Firstpage :
35
Lastpage :
38
Abstract :
Traditional Web crawlers use one or more URLs of the initial Webpages to extract new URLs continuously, and then access data of the pages. AJAX, as one of the core technologies of Web2.0, greatly enhances the response efficiency of Web applications, brings good user experience, and therefore has been widely used. However, due to the use of AJAX techniques shatters the architecture of traditional Web pages which is based on static pages, the traditional Web crawlers cannot meet the challenges of dynamic partial refresh and asynchronous loading. In this paper, we propose an efficient approach for the information in dynamic pages by analyzing script language, and use path repository and judge the page refreshing state to improve the accuracy and efficiency of the algorithm. Experimental evaluation shows the efficiency and effectiveness of our approach.
Keywords :
Internet; Java; XML; authoring languages; AJAX; URL; Web 2.0; Web crawlers; dynamic Web pages crawling; path repository; script language analysis; Algorithm design and analysis; Browsers; Crawlers; HTML; Heuristic algorithms; Servers; Switches; Web crawler; asynchronous loading; dynamic scripts analysis;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Web Information Systems and Applications Conference (WISA), 2012 Ninth
Conference_Location :
Haikou
Print_ISBN :
978-1-4673-3054-1
Type :
conf
DOI :
10.1109/WISA.2012.34
Filename :
6385179
Link To Document :
بازگشت