DocumentCode
2465012
Title
Web information extraction based on hidden Markov model
Author
Lai, Jianbing ; Liu, Qiang ; Liu, Yi
Author_Institution
School of Software, Tsinghua University, Beijing, China
fYear
2010
fDate
14-16 April 2010
Firstpage
234
Lastpage
238
Abstract
This paper proposes a semantic-block-based hidden Markov model. Semantic block is segmented from the elicited information of various websites based on their characteristic of semi-structure. The model adopts semantic block as the basic element in an observation sequence, replacing the original element — word, in order to improve the accuracy and efficiency of the transition matrix. Also, it optimizes the observation probability distribution and the estimation accuracy of state transition sequence by adopting the “voting strategy” and modifying Viterbi algorithm. In the end, the experiment results are able to show that the new model and algorithms give satisfying performance in recall and precision for web information extraction.
Keywords
Algorithm design and analysis; Collaborative work; Data mining; Dictionaries; Hidden Markov models; Internet; Probability distribution; State estimation; Viterbi algorithm; Voting; hidden Markov model; semantic block; semi-structure; voting strategy;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Supported Cooperative Work in Design (CSCWD), 2010 14th International Conference on
Conference_Location
Shanghai, China
Print_ISBN
978-1-4244-6763-1
Type
conf
DOI
10.1109/CSCWD.2010.5471969
Filename
5471969
Link To Document