Title :
A Novel Method for the Web Page Segmentation and Identification
Author :
Wang, Jing ; Liu, Zhijing
Author_Institution :
Sch. of Comput. Sci. & Technol., Xidian Univ., Xian
Abstract :
A method of page segmentation and recognition based on generalized hidden Markov model is present in this paper, according to the page content as well as the structural configuration. It can effective carry on the division to the homepage, thus distinguishes each homepage block part with GHMM. The experimental result indicated that, compares with the original page segmentation algorithm, this algorithm operation efficiency enhanced by 14.3%, and the effect of segmentation has significantly improved.
Keywords :
Internet; hidden Markov models; Web page identification; Web page segmentation; generalized hidden Markov model; Computer science; Data mining; Electronic mail; Frequency; Hidden Markov models; Navigation; Space technology; State estimation; Text processing; Web pages; A Generalized Hidden Markov Model (GHMM); Web Page Identifacation; Web Page Segmentation;
Conference_Titel :
Computer Engineering and Technology, 2009. ICCET '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-3334-6
DOI :
10.1109/ICCET.2009.149