• DocumentCode
    2544735
  • Title

    A Novel Method for the Web Page Segmentation and Identification

  • Author

    Wang, Jing ; Liu, Zhijing

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Xidian Univ., Xian
  • Volume
    1
  • fYear
    2009
  • fDate
    22-24 Jan. 2009
  • Firstpage
    229
  • Lastpage
    231
  • Abstract
    A method of page segmentation and recognition based on generalized hidden Markov model is present in this paper, according to the page content as well as the structural configuration. It can effective carry on the division to the homepage, thus distinguishes each homepage block part with GHMM. The experimental result indicated that, compares with the original page segmentation algorithm, this algorithm operation efficiency enhanced by 14.3%, and the effect of segmentation has significantly improved.
  • Keywords
    Internet; hidden Markov models; Web page identification; Web page segmentation; generalized hidden Markov model; Computer science; Data mining; Electronic mail; Frequency; Hidden Markov models; Navigation; Space technology; State estimation; Text processing; Web pages; A Generalized Hidden Markov Model (GHMM); Web Page Identifacation; Web Page Segmentation;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Engineering and Technology, 2009. ICCET '09. International Conference on
  • Conference_Location
    Singapore
  • Print_ISBN
    978-1-4244-3334-6
  • Type

    conf

  • DOI
    10.1109/ICCET.2009.149
  • Filename
    4769461