• DocumentCode
    3266
  • Title

    Graph-Based Lexicon Regularization for PCFG With Latent Annotations

  • Author

    Xiaodong Zeng ; Wong, Derek F. ; Chao, Lidia S. ; Trancoso, Isabel

  • Author_Institution
    Dept. of Comput. & Inf. Sci., Univ. of Macau, Macau, China
  • Volume
    23
  • Issue
    3
  • fYear
    2015
  • fDate
    Mar-15
  • Firstpage
    441
  • Lastpage
    450
  • Abstract
    This paper aims at learning a better probabilistic context-free grammar with latent annotations (PCFG-LA) by using a graph propagation (GP) technique. We propose leveraging the GP to regularize the lexical model of the grammar. The proposed approach constructs k-nearest neighbor ( k-NN) similarity graphs over words with identical pre-terminal (part-of-speech) tags, for propagating the probabilities of latent annotations given the words. The graphs demonstrate the relationship between words in syntactic and semantic levels, estimated by using a neural word representation method based on Recursive autoencoder (RAE). We modify the conventional PCFG-LA parameter estimation algorithm, expectation maximization (EM), by incorporating a GP process subsequent to the M-step. The GP encourages the smoothness among the graph vertices, where different words under similar syntactic and semantic environments should have approximate posterior distributions of nonterminal subcategories. The proposed PCFG-LA learning approach was evaluated together with a hierarchical split-and-merge training strategy, on parsing tasks for English, Chinese and Portuguese. The empirical results reveal two crucial findings: 1) regularizing the lexicons with GP results in positive effects to parsing accuracy; and 2) learning with unlabeled data can also expand the PCFG-LA lexicons.
  • Keywords
    context-free grammars; expectation-maximisation algorithm; graph theory; natural language processing; probability; Chinese; EM algorithm; English; PCFG-LA parameter estimation algorith; Portuguese; expectation maximization; graph propagation technique; graph-based lexicon regularization; hierarchical split-and-merge training strategy; k-NN similarity graph; k-nearest neighbor; latent annotation; neural word representation; parsing tasks; part-of-speech tags; probabilistic context-free grammar; recursive autoencoder; semantic level; syntactic level; Grammar; Parameter estimation; Semantics; Syntactics; Training; Training data; Vectors; Graph propagation; natural language processing; neural word representation; syntax parsing;
  • fLanguage
    English
  • Journal_Title
    Audio, Speech, and Language Processing, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    2329-9290
  • Type

    jour

  • DOI
    10.1109/TASLP.2015.2389034
  • Filename
    7001570