DocumentCode :
3266
Title :
Graph-Based Lexicon Regularization for PCFG With Latent Annotations
Author :
Xiaodong Zeng ; Wong, Derek F. ; Chao, Lidia S. ; Trancoso, Isabel
Author_Institution :
Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
Volume :
23
Issue :
3
fYear :
2015
fDate :
Mar-15
Firstpage :
441
Lastpage :
450
Abstract :
This paper aims at learning a better probabilistic context-free grammar with latent annotations (PCFG-LA) by using a graph propagation (GP) technique. We propose leveraging the GP to regularize the lexical model of the grammar. The proposed approach constructs k-nearest neighbor ( k-NN) similarity graphs over words with identical pre-terminal (part-of-speech) tags, for propagating the probabilities of latent annotations given the words. The graphs demonstrate the relationship between words in syntactic and semantic levels, estimated by using a neural word representation method based on Recursive autoencoder (RAE). We modify the conventional PCFG-LA parameter estimation algorithm, expectation maximization (EM), by incorporating a GP process subsequent to the M-step. The GP encourages the smoothness among the graph vertices, where different words under similar syntactic and semantic environments should have approximate posterior distributions of nonterminal subcategories. The proposed PCFG-LA learning approach was evaluated together with a hierarchical split-and-merge training strategy, on parsing tasks for English, Chinese and Portuguese. The empirical results reveal two crucial findings: 1) regularizing the lexicons with GP results in positive effects to parsing accuracy; and 2) learning with unlabeled data can also expand the PCFG-LA lexicons.
Keywords :
context-free grammars; expectation-maximisation algorithm; graph theory; natural language processing; probability; Chinese; EM algorithm; English; PCFG-LA parameter estimation algorith; Portuguese; expectation maximization; graph propagation technique; graph-based lexicon regularization; hierarchical split-and-merge training strategy; k-NN similarity graph; k-nearest neighbor; latent annotation; neural word representation; parsing tasks; part-of-speech tags; probabilistic context-free grammar; recursive autoencoder; semantic level; syntactic level; Grammar; Parameter estimation; Semantics; Syntactics; Training; Training data; Vectors; Graph propagation; natural language processing; neural word representation; syntax parsing;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
2329-9290
Type :
jour
DOI :
10.1109/TASLP.2015.2389034
Filename :
7001570
Link To Document :
بازگشت