DocumentCode
2544735
Title
A Novel Method for the Web Page Segmentation and Identification
Author
Wang, Jing ; Liu, Zhijing
Author_Institution
Sch. of Comput. Sci. & Technol., Xidian Univ., Xian
Volume
1
fYear
2009
fDate
22-24 Jan. 2009
Firstpage
229
Lastpage
231
Abstract
A method of page segmentation and recognition based on generalized hidden Markov model is present in this paper, according to the page content as well as the structural configuration. It can effective carry on the division to the homepage, thus distinguishes each homepage block part with GHMM. The experimental result indicated that, compares with the original page segmentation algorithm, this algorithm operation efficiency enhanced by 14.3%, and the effect of segmentation has significantly improved.
Keywords
Internet; hidden Markov models; Web page identification; Web page segmentation; generalized hidden Markov model; Computer science; Data mining; Electronic mail; Frequency; Hidden Markov models; Navigation; Space technology; State estimation; Text processing; Web pages; A Generalized Hidden Markov Model (GHMM); Web Page Identifacation; Web Page Segmentation;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Engineering and Technology, 2009. ICCET '09. International Conference on
Conference_Location
Singapore
Print_ISBN
978-1-4244-3334-6
Type
conf
DOI
10.1109/ICCET.2009.149
Filename
4769461
Link To Document