Title :
P2DHMM: A Novel Web Object Information Extraction Model
Author :
Wang, Jing ; Liu, Zhijing
Author_Institution :
Sch. of Comput. Sci. & Technol., Xidian Univ., Xi´´an
Abstract :
Due to the difference between Web page and plain text document, the concept of Web object is introduced in this paper. Besides, the supposed state transition and the emission symbol conditions are improved based on Pseudo two dimension hidden Markov model (P2D-HMM), and a novel web objects information extraction method is proposed. Finally, through an example, it shows that the proposed method has a very high precision for web objects information extraction.
Keywords :
Internet; hidden Markov models; information retrieval; Web object information extraction model; Web page; emission symbol condition; plain text document; pseudo two dimension hidden Markov model; state transition; Computer science; Data mining; Dictionaries; Electronic mail; HTML; Hidden Markov models; Information analysis; Internet; Spatial databases; Web pages; Information Extraction (IE); Pseudo two-dimension Hidden Markov Model (P2D-HMM); Web Object;
Conference_Titel :
Computer Engineering and Technology, 2009. ICCET '09. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-1-4244-3334-6
DOI :
10.1109/ICCET.2009.147