DocumentCode :
3025715
Title :
Extracting Structured Data from Ajax Site
Author :
Xia, Tian
Author_Institution :
Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China Beijing, Beijing, China
fYear :
2009
fDate :
25-26 April 2009
Firstpage :
259
Lastpage :
262
Abstract :
Ajax is an important approach for improving rich interactivity between Web server and end users during Web 2.0 eras. At the same time, the structured data in AJAX Web pages can not be extracted easily due to its asynchronous loading. In this paper, we propose a technique for extracting the structured data from the AJAX based Web pages. Firstly, an AjaxFetcher component is created to fetch the dynamic page content by using an embedded browser. Secondly, two different strategies are used to extract the structured data from the obtained page contents. Especially for the page that contains multi-records, an automatic approach to determine each possible record is proposed. Experimental results show that fetching Ajax pages and extracting the structured data from them is feasible.
Keywords :
Internet; Java; XML; Ajax based Web pages; AjaxFetcher component; Asynchronous JavaScript and XML; Web 2.0; Data engineering; Data mining; Databases; Information management; Information resources; Knowledge engineering; Knowledge management; Laboratories; Uniform resource locators; Web pages; Ajax; crawler; deep web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Database Technology and Applications, 2009 First International Workshop on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3604-0
Type :
conf
DOI :
10.1109/DBTA.2009.14
Filename :
5207767
Link To Document :
بازگشت