DocumentCode :
2544735
Title :
A Novel Method for the Web Page Segmentation and Identification
Author :
Wang, Jing ; Liu, Zhijing
Author_Institution :
Sch. of Comput. Sci. & Technol., Xidian Univ., Xian
Volume :
1
fYear :
2009
fDate :
22-24 Jan. 2009
Firstpage :
229
Lastpage :
231
Abstract :
A method of page segmentation and recognition based on generalized hidden Markov model is present in this paper, according to the page content as well as the structural configuration. It can effective carry on the division to the homepage, thus distinguishes each homepage block part with GHMM. The experimental result indicated that, compares with the original page segmentation algorithm, this algorithm operation efficiency enhanced by 14.3%, and the effect of segmentation has significantly improved.
Keywords :
Internet; hidden Markov models; Web page identification; Web page segmentation; generalized hidden Markov model; Computer science; Data mining; Electronic mail; Frequency; Hidden Markov models; Navigation; Space technology; State estimation; Text processing; Web pages; A Generalized Hidden Markov Model (GHMM); Web Page Identifacation; Web Page Segmentation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Engineering and Technology, 2009. ICCET '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-3334-6
Type :
conf
DOI :
10.1109/ICCET.2009.149
Filename :
4769461
Link To Document :
بازگشت