• DocumentCode
    3105829
  • Title

    Automatic Syntactic Segment Filtration for Mass Syntax Corpus with Mutual Information

  • Author

    Wang, Bo ; Zhao, Tiejun ; Yang, Muyun ; Li, Sheng

  • Author_Institution
    Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
  • fYear
    2009
  • fDate
    13-14 Dec. 2009
  • Firstpage
    234
  • Lastpage
    237
  • Abstract
    Syntactic analysis (syntactic parsing) is an important method in the natural language processing. The syntactic parsing aims to find a linguistic structure of a sentence with the knowledge of a certain grammar. The constituent parser which can build hierarchical structure with the phrase segments is the most popular method in nowadays NLP applications. Many approaches have been done to the parsing algorithms to improve the precision and recall of the found syntactic segments. In this paper, we propose a novel method to greatly improve the precision of the syntactic segments without dig into the parsing algorithms. The method is introduced as a post-processing which filters the syntactic segments according to their mutual information with the context. The new method can obtain a high confidential subset from a mass syntax corpus and is independent with the parsing algorithms. The effectiveness of the approach is validated by the experimental results.
  • Keywords
    grammars; natural language processing; automatic syntactic segment filtration; mass syntax corpus; natural language processing; syntactic analysis; syntactic parsing; Computer science; Conference management; Engineering management; Filtration; Information analysis; Information management; Information technology; Mutual information; Natural language processing; Technology management; mutual information; parsing; syntax;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Future Information Technology and Management Engineering, 2009. FITME '09. Second International Conference on
  • Conference_Location
    Sanya
  • Print_ISBN
    978-1-4244-5339-9
  • Type

    conf

  • DOI
    10.1109/FITME.2009.64
  • Filename
    5380969