• DocumentCode
    1606341
  • Title

    Identification of Nominal Multiword Expressions in Bengali using CRF

  • Author

    Chakraborty, Tamal

  • Author_Institution
    Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, Kharagpur, India
  • fYear
    2012
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    One of the key issues in both natural language understanding and generation is the appropriate processing of Multiword Expressions (MWEs). MWEs pose a huge problem to a precise language processing due to their idiosyncratic nature and diversity in lexical, syntactical and semantic properties. The semantic of a MWE can be expressed transparently or opaquely after combining the semantic of its constituents. This paper deals with the identification of Nominal Multiword Expressions in the Bengali text using Conditional Random Field (CRF) machine learning technique. Bengali is highly agglutinative and morphologically rich language. Thus the selection of features such as surrounding words, POS tag, prefix, suffix, length etc are proved to be very effective for running the CRF tool for the identification of Nominal MWEs. Compared to the statistical system built in Bengali language for compound noun MWEs identification, our proposed system shows higher accuracy in terms of precision, recall and F-score. We also conclude that with the identification of Reduplicated MWEs (RMWEs) and considering it as a feature makes reasonable improvement compared to the earlier system.
  • Keywords
    learning (artificial intelligence); natural language processing; statistical analysis; text analysis; Bengali language; Bengali text; CRF; F-score; POS tag; RMWE; conditional random field machine learning technique; idiosyncratic nature; language processing; lexical properties; nominal multiword expression identification; reduplicated MWE; semantic properties; statistical system; surrounding words; syntactical properties; Compounds; Feature extraction; Labeling; Semantics; Standards; Testing; Training; Bengali; CRF; Multiword Expressions; Reduplications;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
  • Conference_Location
    Kharagpur
  • Print_ISBN
    978-1-4673-4367-1
  • Type

    conf

  • DOI
    10.1109/IHCI.2012.6481823
  • Filename
    6481823