DocumentCode :
2858536
Title :
Web Information Extraction Based on Hierarchical Model
Author :
Liu, Yaqing ; Chen, Rong ; Yang, Hong
Author_Institution :
Sch. of Inf. Sci. & Technol., Dalian Maritime Univ., Dalian, China
fYear :
2009
fDate :
11-13 Dec. 2009
Firstpage :
1
Lastpage :
5
Abstract :
A hierarchical extraction model based on hidden Markov model is proposed after analyzing some existing algorithms used in the field of Web information extraction. We firstly annotate atom information items and compound information items in HTML documents and then use a bottomup clustering method to build a DOM+ tree. At last, we make use of the annotated information of atom information items and compound information items with the compound information items´ paths in DOM+ tree to build the hierarchical extraction model. Experiments show that we may get better performance by using hierarchical extraction model.
Keywords :
hidden Markov models; hypermedia markup languages; knowledge acquisition; DOM+ tree; HTML documents; Web information extraction; atom information items; bottomup clustering method; compound information items; hidden Markov model; hierarchical extraction model; Data mining; Database languages; HTML; Hidden Markov models; Induction generators; Information science; Mathematical model; Search engines; Web pages; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4507-3
Electronic_ISBN :
978-1-4244-4507-3
Type :
conf
DOI :
10.1109/CISE.2009.5365870
Filename :
5365870
Link To Document :
بازگشت