Title :
Design & development of rule based inflectional and derivational Urdu stemmer ‘Usal’
Author :
Gupta, Vaishali ; Joshi, Nisheeth ; Mathur, Iti
Author_Institution :
Dept. of Comput. Sci. & Eng. IES, IPS Acad., Indore, India
Abstract :
Urdu is a morphologically rich language that means Urdu words having different variant form of words. In Natural Language Processing, morphology plays an important role. Morphology means study of word structure. In this paper, we focused on Urdu language and developed inflectional and derivational rule based Urdu stemmer. Stemming is a branch of morphology. In general, we can say that Stemming is a process of extracting `root´ word from its actual word and separate the affixes. Through this simple rule based stemming algorithm, raised the problem of under-stemming and over-stemming. To reduce the problem of under-stemming, we have used longest suffix stripping algorithm and to reduce the problem of over-stemming, we have created database of exception words and stop-words.
Keywords :
natural language processing; longest suffix stripping algorithm; morphologically rich language; natural language processing; rule based derivational Urdu stemmer Usal; rule based inflectional Urdu stemmer Usal; rule based stemming algorithm; Accuracy; Algorithm design and analysis; Databases; Knowledge management; Market research; Morphology; Derivational; Inflectional; Root; Stemmer; affix; stopwords;
Conference_Titel :
Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE), 2015 International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-8432-9
DOI :
10.1109/ABLAZE.2015.7154958