Title :
Shallow Parsing for Hindi - An extensive analysis of sequential learning algorithms using a large annotated corpus
Author :
Gahlot, Himanshu ; Krishnarao, Awaghad Ashish ; Kushwaha, D.S.
Author_Institution :
Motilal Nehru Nat. Inst. of Technol., Allahabad
Abstract :
In this paper, we provide the first comprehensive comparison of methods for part-of-speech tagging and chunking for Hindi. We present an analysis of the application of three major learning algorithms (viz. Maximum entropy models [2] [9], Conditional random fields [12] and Support Vector Machines [8]) to part-of-speech tagging and chunking for Hindi Language using datasets of different sizes. The use of language independent features make this analysis more general and capable of concluding important results for similar South and South East Asian Languages. The results show that CRFs outperform SVMs and Maxent in terms of accuracy. We are able to achieve an accuracy of 92.26% for part-of-speech tagging and 93.57% for chunking using Conditional Random Fields algorithm. The corpus we have used had 138177 annotated instances for training. We report results for three learning algorithms by varying various conditions (clustering, BIEO notation vs. BIES notation, multiclass methods for SVMs etc.) and present an extensive analysis of the whole process. These results will give future researchers an insight into how to shape their research keeping in mind the comparative performance of major algorithms on datasets of various sizes and in various conditions.
Keywords :
grammars; learning (artificial intelligence); natural language processing; support vector machines; word processing; Hindi language; South Asian Languages; South East Asian Languages; conditional random fields algorithm; language independent features; large annotated corpus; part-of-speech chunking; part-of-speech tagging; sequential learning algorithms; shallow parsing; Algorithm design and analysis; Clustering algorithms; Entropy; Hidden Markov models; Machine learning; Natural languages; Speech; Stochastic processes; Support vector machines; Tagging;
Conference_Titel :
Advance Computing Conference, 2009. IACC 2009. IEEE International
Conference_Location :
Patiala
Print_ISBN :
978-1-4244-2927-1
Electronic_ISBN :
978-1-4244-2928-8
DOI :
10.1109/IADCC.2009.4809178