DocumentCode
3447338
Title
Extract Deep Web-Detail Pages with Simple Tree Match
Author
Wei Zhang ; Ye Deng ; Ranran Du ; Qiuhong Wang
Author_Institution
Dept. of Comput. Sci. & Technol., Ocean Univ. of China, Qingdao, China
Volume
1
fYear
2011
fDate
20-22 Aug. 2011
Firstpage
250
Lastpage
254
Abstract
In this paper, we provide a method to extract data from Deep Web-Detail Pages. The method use the Simple Tree Match to compute the max match value between two trees, and use the Hungarian algorithm to trace the result of the STM compute, after this we use tree merge method to generate Wrapper. At last, we use Term Frequency to optimize the Wrapper. In experimental, we use the Wrapper to extract data; the results show that our method compared with other methods is feasible and effective.
Keywords
Internet; data mining; trees (mathematics); Hungarian algorithm; STM; Wrapper; deep Web-detail pages; max match value; simple tree match; term frequency; tree merge method; Data mining; Data models; HTML; Internet; Visualization; Web pages; World Wide Web; Deep Web; Hungarian Algorithm; STM; Web Extract;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Technology and Artificial Intelligence Conference (ITAIC), 2011 6th IEEE Joint International
Conference_Location
Chongqing
Print_ISBN
978-1-4244-8622-9
Type
conf
DOI
10.1109/ITAIC.2011.6030197
Filename
6030197
Link To Document