DocumentCode
2579026
Title
An Approach for Crawling Dynamic WebPages Based on Script Language Analysis
Author
Yao, Zhang ; Daling, Wang ; Shi, Feng ; Yifei, Zhang ; Fangling, Leng
Author_Institution
Sch. of Inf. Sci. & Eng., Northeastern Univ., Shenyang, China
fYear
2012
fDate
16-18 Nov. 2012
Firstpage
35
Lastpage
38
Abstract
Traditional Web crawlers use one or more URLs of the initial Webpages to extract new URLs continuously, and then access data of the pages. AJAX, as one of the core technologies of Web2.0, greatly enhances the response efficiency of Web applications, brings good user experience, and therefore has been widely used. However, due to the use of AJAX techniques shatters the architecture of traditional Web pages which is based on static pages, the traditional Web crawlers cannot meet the challenges of dynamic partial refresh and asynchronous loading. In this paper, we propose an efficient approach for the information in dynamic pages by analyzing script language, and use path repository and judge the page refreshing state to improve the accuracy and efficiency of the algorithm. Experimental evaluation shows the efficiency and effectiveness of our approach.
Keywords
Internet; Java; XML; authoring languages; AJAX; URL; Web 2.0; Web crawlers; dynamic Web pages crawling; path repository; script language analysis; Algorithm design and analysis; Browsers; Crawlers; HTML; Heuristic algorithms; Servers; Switches; Web crawler; asynchronous loading; dynamic scripts analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Information Systems and Applications Conference (WISA), 2012 Ninth
Conference_Location
Haikou
Print_ISBN
978-1-4673-3054-1
Type
conf
DOI
10.1109/WISA.2012.34
Filename
6385179
Link To Document