DocumentCode :
2330381
Title :
Evolving natural language grammars without supervision
Author :
Araujo, L. ; Santamaría, Jesús
Author_Institution :
Dept. de Lenguajes y Sist. Informaticos, UNED, Madrid, Spain
fYear :
2010
fDate :
18-23 July 2010
Firstpage :
1
Lastpage :
8
Abstract :
Unsupervised grammar induction is one of the most difficult works of language processing. Its goal is to extract a grammar representing the language structure using texts without annotations of this structure. We have devised an evolutionary algorithm which for each sentence evolves a population of trees that represent different parse trees of that sentence. Each of these trees represent a part of a grammar. The evaluation function takes into account the contexts in which each sequence of Part-Of-Speech tags (POSseq) appears in the training corpus, as well as the frequencies of those POSseqs and contexts. The grammar for the whole training corpus is constructed in an incremental manner. The algorithm has been evaluated using a well known Annotated English corpus, though the annotation have only been used for evaluation purposes. Results indicate that the proposed algorithm is able to improve the results of a classical optimization algorithm, such as EM (Expectation Maximization), for short grammar constituents (right side of the grammar rules), and its precision is better in general.
Keywords :
evolutionary computation; grammars; natural language processing; unsupervised learning; POSseq; evolutionary algorithm; grammar representation; language structure; natural language grammars; part-of-speech tags; unsupervised grammar induction; Artificial neural networks; Context; Evolutionary computation; Grammar; Natural languages; Particle separators; Training;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Evolutionary Computation (CEC), 2010 IEEE Congress on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-6909-3
Type :
conf
DOI :
10.1109/CEC.2010.5586291
Filename :
5586291
Link To Document :
بازگشت