DocumentCode
1606341
Title
Identification of Nominal Multiword Expressions in Bengali using CRF
Author
Chakraborty, Tamal
Author_Institution
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol., Kharagpur, Kharagpur, India
fYear
2012
Firstpage
1
Lastpage
6
Abstract
One of the key issues in both natural language understanding and generation is the appropriate processing of Multiword Expressions (MWEs). MWEs pose a huge problem to a precise language processing due to their idiosyncratic nature and diversity in lexical, syntactical and semantic properties. The semantic of a MWE can be expressed transparently or opaquely after combining the semantic of its constituents. This paper deals with the identification of Nominal Multiword Expressions in the Bengali text using Conditional Random Field (CRF) machine learning technique. Bengali is highly agglutinative and morphologically rich language. Thus the selection of features such as surrounding words, POS tag, prefix, suffix, length etc are proved to be very effective for running the CRF tool for the identification of Nominal MWEs. Compared to the statistical system built in Bengali language for compound noun MWEs identification, our proposed system shows higher accuracy in terms of precision, recall and F-score. We also conclude that with the identification of Reduplicated MWEs (RMWEs) and considering it as a feature makes reasonable improvement compared to the earlier system.
Keywords
learning (artificial intelligence); natural language processing; statistical analysis; text analysis; Bengali language; Bengali text; CRF; F-score; POS tag; RMWE; conditional random field machine learning technique; idiosyncratic nature; language processing; lexical properties; nominal multiword expression identification; reduplicated MWE; semantic properties; statistical system; surrounding words; syntactical properties; Compounds; Feature extraction; Labeling; Semantics; Standards; Testing; Training; Bengali; CRF; Multiword Expressions; Reduplications;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Human Computer Interaction (IHCI), 2012 4th International Conference on
Conference_Location
Kharagpur
Print_ISBN
978-1-4673-4367-1
Type
conf
DOI
10.1109/IHCI.2012.6481823
Filename
6481823
Link To Document