مرکز منطقه ای اطلاع رساني علوم و فناوري - Graph-Based Lexicon Regularization for PCFG With Latent Annotations

DocumentCode :

3266

Title :

Graph-Based Lexicon Regularization for PCFG With Latent Annotations

Author :

Xiaodong Zeng ; Wong, Derek F. ; Chao, Lidia S. ; Trancoso, Isabel

Author_Institution :

Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China

Volume :

Issue :

fYear :

2015

fDate :

Mar-15

Firstpage :

441

Lastpage :

450

Abstract :

This paper aims at learning a better probabilistic context-free grammar with latent annotations (PCFG-LA) by using a graph propagation (GP) technique. We propose leveraging the GP to regularize the lexical model of the grammar. The proposed approach constructs k-nearest neighbor ( k-NN) similarity graphs over words with identical pre-terminal (part-of-speech) tags, for propagating the probabilities of latent annotations given the words. The graphs demonstrate the relationship between words in syntactic and semantic levels, estimated by using a neural word representation method based on Recursive autoencoder (RAE). We modify the conventional PCFG-LA parameter estimation algorithm, expectation maximization (EM), by incorporating a GP process subsequent to the M-step. The GP encourages the smoothness among the graph vertices, where different words under similar syntactic and semantic environments should have approximate posterior distributions of nonterminal subcategories. The proposed PCFG-LA learning approach was evaluated together with a hierarchical split-and-merge training strategy, on parsing tasks for English, Chinese and Portuguese. The empirical results reveal two crucial findings: 1) regularizing the lexicons with GP results in positive effects to parsing accuracy; and 2) learning with unlabeled data can also expand the PCFG-LA lexicons.

Keywords :

context-free grammars; expectation-maximisation algorithm; graph theory; natural language processing; probability; Chinese; EM algorithm; English; PCFG-LA parameter estimation algorith; Portuguese; expectation maximization; graph propagation technique; graph-based lexicon regularization; hierarchical split-and-merge training strategy; k-NN similarity graph; k-nearest neighbor; latent annotation; neural word representation; parsing tasks; part-of-speech tags; probabilistic context-free grammar; recursive autoencoder; semantic level; syntactic level; Grammar; Parameter estimation; Semantics; Syntactics; Training; Training data; Vectors; Graph propagation; natural language processing; neural word representation; syntax parsing;

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher :

ieee

ISSN :

2329-9290

Type :

jour

DOI :

10.1109/TASLP.2015.2389034

Filename :

7001570

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3266