DocumentCode :
3317262
Title :
Improved estimation for unsupervised part-of-speech tagging
Author :
Wang, Qin Iris ; Schuurmans, Dale
Author_Institution :
Dept. of Comput. Sci., Alberta Univ., Edmonton, Alta., Canada
fYear :
2005
fDate :
30 Oct.-1 Nov. 2005
Firstpage :
219
Lastpage :
224
Abstract :
We demonstrate that a simple hidden Markov model can achieve state of the art performance in unsupervised part-of-speech tagging, by improving aspects of standard Baum-Welch (EM) estimation. One improvement uses word similarities to smooth the lexical tag → word probability estimates, which avoids over-fitting the lexical model. Another improvement constrains the model to preserve a specified marginal distribution over the hidden tags, which avoids over-fitting the tag → tag transition model. Although using more contextual information than an HMM remains desirable, improving basic estimation still leads to significant improvements and remains a prerequisite for training more complex models.
Keywords :
hidden Markov models; natural languages; unsupervised learning; word processing; hidden Markov model; lexical model; lexical tag; standard Baum-Welch estimation; tag transition model; unsupervised part-of-speech tagging; word probability estimate; word similarity; Buildings; Context modeling; Entropy; Hidden Markov models; Iris; Parameter estimation; State estimation; Tagging; Training data; Unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference on
Print_ISBN :
0-7803-9361-9
Type :
conf
DOI :
10.1109/NLPKE.2005.1598738
Filename :
1598738
Link To Document :
بازگشت