Title :
Extracting Structured Data from Ajax Site
Author_Institution :
Key Lab. of Data Eng. & Knowledge Eng., Renmin Univ. of China Beijing, Beijing, China
Abstract :
Ajax is an important approach for improving rich interactivity between Web server and end users during Web 2.0 eras. At the same time, the structured data in AJAX Web pages can not be extracted easily due to its asynchronous loading. In this paper, we propose a technique for extracting the structured data from the AJAX based Web pages. Firstly, an AjaxFetcher component is created to fetch the dynamic page content by using an embedded browser. Secondly, two different strategies are used to extract the structured data from the obtained page contents. Especially for the page that contains multi-records, an automatic approach to determine each possible record is proposed. Experimental results show that fetching Ajax pages and extracting the structured data from them is feasible.
Keywords :
Internet; Java; XML; Ajax based Web pages; AjaxFetcher component; Asynchronous JavaScript and XML; Web 2.0; Data engineering; Data mining; Databases; Information management; Information resources; Knowledge engineering; Knowledge management; Laboratories; Uniform resource locators; Web pages; Ajax; crawler; deep web;
Conference_Titel :
Database Technology and Applications, 2009 First International Workshop on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3604-0
DOI :
10.1109/DBTA.2009.14