DocumentCode :
2665169
Title :
Using hidden Markov model for information extraction based on multiple templates
Author :
Liu, Yunzhong ; Lin, Yaping ; Chen, Zhiping
Author_Institution :
Coll. of Comput. & Commun., Hunan Univ., Changsha, China
fYear :
2003
fDate :
26-29 Oct. 2003
Firstpage :
394
Lastpage :
399
Abstract :
Recent researches have demonstrated the strong performance of hidden Markov models applied to information extraction-the task of populating database slots with corresponding phrases from text documents. It is well known that the training data coming from different sources is probably different in their formats although their contents are similar. In the previous information extraction researches, all the training data is mixed together to learn hidden Markov model parameters. But the training data as a whole is multicomponent. And it is difficult for using statistical learning technique to find optimal model parameters. We present a new algorithm using hidden Markov model for information extraction based on multiple templates, which first clusters the training data into multiple templates based on the format, then learns model structure parameters from the clustered training data and model emission probability parameters from the initial training data for information extraction. The experimental results show that the new algorithm outperforms the original one, which hasn´t clustered the training data into multiple templates, in both precision and recall.
Keywords :
hidden Markov models; learning (artificial intelligence); natural languages; hidden Markov model; information extraction; model emission probability parameter; multiple templates; optimal model parameter; statistical learning technique; Clustering algorithms; Data mining; Databases; Educational institutions; Entropy; Filling; Hidden Markov models; Search engines; Statistical learning; Training data;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on
Conference_Location :
Beijing, China
Print_ISBN :
0-7803-7902-0
Type :
conf
DOI :
10.1109/NLPKE.2003.1275937
Filename :
1275937
Link To Document :
بازگشت