Title :
Learning automatic acquisition of subcategorization frames using Bayesian inference and support vector machines
Author :
Maragoudakis, M. ; Kermanidis, K. ; Fakotakis, N. ; Kokkinakis, G.
Author_Institution :
Dept. of Electr. & Comput. Eng., Patras Univ., Greece
Abstract :
Learning Bayesian belief networks (BBN) from corpora and support vector machines (SVM) have been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We are incorporating minimal linguistic resources, i.e. basic morphological tagging and phrase chunking, to demonstrate that verb subcategorization, which is of great significance for developing robust natural language human computer interaction systems, could be achieved using large corpora, without having any general-purpose, syntactic parser at all. In addition, apart from BBN and SVM, which have not previously used for this task, we have experimented with three well-known machine learning methods (feedforward backpropagation neural networks, learning vector quantization and decision tables), which are also being applied to the task of verb subcategorization frame defection for the first time. We argue that both BBN and SVM are well suited for learning to identify verb subcategorization frames. Empirical results will support this claim. Performance has been methodically evaluated using two different corpora types, one balanced and one domain-specific in order to determine the unbiased behaviour of the trained models. Limited training data are proved to endow with satisfactory results. We have been able to achieve precision exceeding 80% on the identification of subcategorization frames which were not known beforehand
Keywords :
backpropagation; belief networks; computational linguistics; feedforward neural nets; inference mechanisms; learning automata; vector quantisation; Bayesian belief network learning; Bayesian inference; Modern Greek; automatic verb subcategorization frame acquisition; corpora; decision tables; feedforward backpropagation neural networks; learning vector quantization; machine learning methods; minimal linguistic resources; morphological tagging; natural language human computer interaction systems; performance evaluation; phrase chunking; support vector machines; Backpropagation; Bayesian methods; Human computer interaction; Learning automata; Learning systems; Machine learning; Natural languages; Robustness; Support vector machines; Tagging;
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
DOI :
10.1109/ICDM.2001.989583