Title :
Learning Disambiguation of Hindi Morpheme "vaalaa\´ with a Sparse Corpus
Author :
Sinha, R. Mahesh K
Author_Institution :
Indian Inst. of Technol., Kanpur, India
Abstract :
The Hindi morpheme ` vaalaa´ is very widely used as a suffix and also as a separate word. The common usage of this suffix is to denote an activity or profession of a person. This form of the usage has been borrowed in English with the spelling of ` wallah´. However, it has a large number of other interpretations depending upon the context in which it is used. This paper presents an account of different senses in which this morpheme is used in Hindi and presents a strategy for learning their disambiguation based on contextual features with sparse data using a semi-supervised method. We present a new technique of unifying learned instances using supervised training with limited data and computing matching index and bootstrapping the training set to deal with corpus sparseness. This study finds application in machine translation, information retrieval, text understanding and text summarization.
Keywords :
learning (artificial intelligence); natural language processing; Hindi morphemevaalaa; contextual feature; semisupervised learning; sparse corpus; suffix; word sense disambiguation; Books; Data mining; Information retrieval; Iron; Machine learning; Ontologies; Pattern matching; Vehicles; Hindi morpheme "vaalaa\´; semi-supervised learning; sense disambiguation; sparse corpus;
Conference_Titel :
Machine Learning and Applications, 2009. ICMLA '09. International Conference on
Conference_Location :
Miami Beach, FL
Print_ISBN :
978-0-7695-3926-3
DOI :
10.1109/ICMLA.2009.130