Title :
Identification of Nominal Multiword Expressions in Bengali using CRF
Author :
Chakraborty, Tamal
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, Kharagpur, India
Abstract :
One of the key issues in both natural language understanding and generation is the appropriate processing of Multiword Expressions (MWEs). MWEs pose a huge problem to a precise language processing due to their idiosyncratic nature and diversity in lexical, syntactical and semantic properties. The semantic of a MWE can be expressed transparently or opaquely after combining the semantic of its constituents. This paper deals with the identification of Nominal Multiword Expressions in the Bengali text using Conditional Random Field (CRF) machine learning technique. Bengali is highly agglutinative and morphologically rich language. Thus the selection of features such as surrounding words, POS tag, prefix, suffix, length etc are proved to be very effective for running the CRF tool for the identification of Nominal MWEs. Compared to the statistical system built in Bengali language for compound noun MWEs identification, our proposed system shows higher accuracy in terms of precision, recall and F-score. We also conclude that with the identification of Reduplicated MWEs (RMWEs) and considering it as a feature makes reasonable improvement compared to the earlier system.
Keywords :
learning (artificial intelligence); natural language processing; statistical analysis; text analysis; Bengali language; Bengali text; CRF; F-score; POS tag; RMWE; conditional random field machine learning technique; idiosyncratic nature; language processing; lexical properties; nominal multiword expression identification; reduplicated MWE; semantic properties; statistical system; surrounding words; syntactical properties; Compounds; Feature extraction; Labeling; Semantics; Standards; Testing; Training; Bengali; CRF; Multiword Expressions; Reduplications;
Conference_Titel :
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location :
Kharagpur
Print_ISBN :
978-1-4673-4367-1
DOI :
10.1109/IHCI.2012.6481823