Title :
Interpolated probabilistic tagging model optimized with genetic algorithm
Author :
Wong, Fa ; Chao, Sam ; Hu, Dong-Cheng ; Mao, W-Hang
Author_Institution :
Fac. of Sci. & Technol., Macao Univ., China
Abstract :
We present results of probabilistic tagging of Portuguese texts in order to show how these techniques work for one of the highly morphologically ambiguous inflective languages by using a limited corpus as the basic training source. In order to cope the ambiguities problem caused by the insufficient training data, especially the unknown words, we incorporate the lexical features into the probabilistic model. Different from other proposed tagging models, these features are introduced into the word probabilities by means of interpolation. A technique to determine the optimal set of interpolation parameters based on genetic algorithm is described. Our preliminary result shows that we can correctly tag 91.8% of the sentences based on our tagging model.
Keywords :
genetic algorithms; interpolation; probability; text analysis; Portuguese texts; genetic algorithm; interpolated probabilistic tagging model; interpolation; word probability; Chaos; Genetic algorithms; Interpolation; Natural language processing; Natural languages; Probability; Speech; Statistical analysis; Tagging; Training data;
Conference_Titel :
Machine Learning and Cybernetics, 2004. Proceedings of 2004 International Conference on
Print_ISBN :
0-7803-8403-2
DOI :
10.1109/ICMLC.2004.1382237