Title : 
Evolving natural language grammars without supervision
         
        
            Author : 
Araujo, L. ; Santamaría, Jesús
         
        
            Author_Institution : 
Dept. de Lenguajes y Sist. Informaticos, UNED, Madrid, Spain
         
        
        
        
        
        
            Abstract : 
Unsupervised grammar induction is one of the most difficult works of language processing. Its goal is to extract a grammar representing the language structure using texts without annotations of this structure. We have devised an evolutionary algorithm which for each sentence evolves a population of trees that represent different parse trees of that sentence. Each of these trees represent a part of a grammar. The evaluation function takes into account the contexts in which each sequence of Part-Of-Speech tags (POSseq) appears in the training corpus, as well as the frequencies of those POSseqs and contexts. The grammar for the whole training corpus is constructed in an incremental manner. The algorithm has been evaluated using a well known Annotated English corpus, though the annotation have only been used for evaluation purposes. Results indicate that the proposed algorithm is able to improve the results of a classical optimization algorithm, such as EM (Expectation Maximization), for short grammar constituents (right side of the grammar rules), and its precision is better in general.
         
        
            Keywords : 
evolutionary computation; grammars; natural language processing; unsupervised learning; POSseq; evolutionary algorithm; grammar representation; language structure; natural language grammars; part-of-speech tags; unsupervised grammar induction; Artificial neural networks; Context; Evolutionary computation; Grammar; Natural languages; Particle separators; Training;
         
        
        
        
            Conference_Titel : 
Evolutionary Computation (CEC), 2010 IEEE Congress on
         
        
            Conference_Location : 
Barcelona
         
        
            Print_ISBN : 
978-1-4244-6909-3
         
        
        
            DOI : 
10.1109/CEC.2010.5586291