• DocumentCode
    3447338
  • Title

    Extract Deep Web-Detail Pages with Simple Tree Match

  • Author

    Wei Zhang ; Ye Deng ; Ranran Du ; Qiuhong Wang

  • Author_Institution
    Dept. of Comput. Sci. & Technol., Ocean Univ. of China, Qingdao, China
  • Volume
    1
  • fYear
    2011
  • fDate
    20-22 Aug. 2011
  • Firstpage
    250
  • Lastpage
    254
  • Abstract
    In this paper, we provide a method to extract data from Deep Web-Detail Pages. The method use the Simple Tree Match to compute the max match value between two trees, and use the Hungarian algorithm to trace the result of the STM compute, after this we use tree merge method to generate Wrapper. At last, we use Term Frequency to optimize the Wrapper. In experimental, we use the Wrapper to extract data; the results show that our method compared with other methods is feasible and effective.
  • Keywords
    Internet; data mining; trees (mathematics); Hungarian algorithm; STM; Wrapper; deep Web-detail pages; max match value; simple tree match; term frequency; tree merge method; Data mining; Data models; HTML; Internet; Visualization; Web pages; World Wide Web; Deep Web; Hungarian Algorithm; STM; Web Extract;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Technology and Artificial Intelligence Conference (ITAIC), 2011 6th IEEE Joint International
  • Conference_Location
    Chongqing
  • Print_ISBN
    978-1-4244-8622-9
  • Type

    conf

  • DOI
    10.1109/ITAIC.2011.6030197
  • Filename
    6030197