• DocumentCode
    2789889
  • Title

    An Experimental Study on Lexicalized Statistical Parsing for Vietnamese

  • Author

    Le, Anh-Cuong ; Nguyen, Phuong-Thai ; Vuong, Hoai-Thu ; Pham, Minh-Thu ; Ho, Tu-Bao

  • Author_Institution
    Coll. of Technol., Vietnam Nat. Univ. of Hanoi, Hanoi, Vietnam
  • fYear
    2009
  • fDate
    13-17 Oct. 2009
  • Firstpage
    162
  • Lastpage
    167
  • Abstract
    Syntactic parsing is a central problem and a challenge in the field of natural language processing. It attracts many studies and consequently there exists the effective parsers for several popular languages such as English and Chinese. For Vietnamese parsing, there have been a few studies focusing on this problem, these studies lack of applying modern techniques, and no popular parser has been released. This paper presents the first study on developing a Vietnamese wide coverage parser based on lexicalized probabilistic context free grammar (LPCFG) and using a standard parsed corpus (similar to Penn Treebank). In this paper the Bikel´s parser is modified to analyze Vietnamese. We also provide a comparison based on investigating different parsing models and different linguistic features. The best configuration achieves around 78% of F-score.
  • Keywords
    context-free grammars; natural language processing; probability; Bikel parser; Penn Treebank; Vietnamese parsing; Vietnamese wide coverage parser; lexicalized probabilistic context free grammar; lexicalized statistical parsing; natural language processing; standard parsed corpus; Context modeling; Educational institutions; Knowledge engineering; Natural language processing; Natural languages; Power capacitors; Probability; Standards development; Systems engineering and theory; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Knowledge and Systems Engineering, 2009. KSE '09. International Conference on
  • Conference_Location
    Hanoi
  • Print_ISBN
    978-1-4244-5086-2
  • Electronic_ISBN
    978-0-7695-3846-4
  • Type

    conf

  • DOI
    10.1109/KSE.2009.41
  • Filename
    5361714