DocumentCode
476079
Title
A hybrid approach for web information extraction
Author
Xiao, Ji-yi ; Zhu, Dao-hui ; Zou, La-mei
Author_Institution
Sch. of Comput. Sci. & Technol., South China Univ., Hengyang
Volume
3
fYear
2008
fDate
12-15 July 2008
Firstpage
1560
Lastpage
1563
Abstract
This paper presents a new approach based on maximum entropy and maximum entropy Markov model for web information extraction. This approach is not only able to overcome the shortcoming of the less precision and recall of the hidden Markov model. In addition, this approach can make the most of various kinds of contextual information from web. The experiments are found that the hybrid approach has an average precision rate of 87.516% while the hidden Markov model trained by the Baum-Welch algorithm has an average precision rate of 68.630%. This implies that the hybrid approach is more optimized than the hidden Markov model trained by the Baum-Welch algorithm.
Keywords
Internet; hidden Markov models; information retrieval; knowledge acquisition; Web information extraction; hidden Markov model; maximum entropy method; Computer science; Cybernetics; Data mining; Electronic mail; Entropy; Hidden Markov models; Iterative algorithms; Machine learning; Probability distribution; Training data; Generalized iterative scaling; Hidden Markov model; Information extraction; Maximum entropy; Maximum entropy Markov model;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location
Kunming
Print_ISBN
978-1-4244-2095-7
Electronic_ISBN
978-1-4244-2096-4
Type
conf
DOI
10.1109/ICMLC.2008.4620654
Filename
4620654
Link To Document