Title :
Extract Deep Web-Detail Pages with Simple Tree Match
Author :
Wei Zhang ; Ye Deng ; Ranran Du ; Qiuhong Wang
Author_Institution :
Dept. of Comput. Sci. & Technol., Ocean Univ. of China, Qingdao, China
Abstract :
In this paper, we provide a method to extract data from Deep Web-Detail Pages. The method use the Simple Tree Match to compute the max match value between two trees, and use the Hungarian algorithm to trace the result of the STM compute, after this we use tree merge method to generate Wrapper. At last, we use Term Frequency to optimize the Wrapper. In experimental, we use the Wrapper to extract data; the results show that our method compared with other methods is feasible and effective.
Keywords :
Internet; data mining; trees (mathematics); Hungarian algorithm; STM; Wrapper; deep Web-detail pages; max match value; simple tree match; term frequency; tree merge method; Data mining; Data models; HTML; Internet; Visualization; Web pages; World Wide Web; Deep Web; Hungarian Algorithm; STM; Web Extract;
Conference_Titel :
Information Technology and Artificial Intelligence Conference (ITAIC), 2011 6th IEEE Joint International
Conference_Location :
Chongqing
Print_ISBN :
978-1-4244-8622-9
DOI :
10.1109/ITAIC.2011.6030197