Title :
A shallow parser for Tamil
Author :
Ariaratnam, I. ; Weerasinghe, A.R. ; Liyanage, C.
Author_Institution :
Sch. of Comput., Univ. of Colombo, Colombo, Sri Lanka
Abstract :
This research is an attempt to build a shallow parser designed to assign a partial structure to natural language sentences in order to recover useful syntactic information from Sri Lankan Tamil sentences. It uses a combination of a maximum entropy based part-of-speech (POS) tagger which automatically labels each word in a sentence with the appropriate POS tag, and a rule-based chunker which segments the sentences into syntactically correlated word groups, without the need for a large annotated corpus. To do this, we developed a POS tagset consisting of 20 POS tags using expert input, manually annotated a corpus of approximately 12500 words, and identified 390 chunk patterns to extract the chunks. Our POS tagger and chunker demonstrated promising f-measures of 81.72% and 78.3% respectively. Our combined shallow parser gives an f-measure of 66.6% owing to error propagation.
Keywords :
grammars; maximum entropy methods; natural language processing; POS tagger; POS tagset; Sri Lankan Tamil sentences; corpus annotation; f-measure; maximum entropy based part-of-speech tagger; natural language sentences; partial structure; rule-based chunker; shallow parser; useful syntactic information recovery; Manganese; Three-dimensional displays; Chunking; Maximum Entropy Model; Part-of-speech Tagging; Partial Parsing; Shallow Parsing; Tamil Language Processing;
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2014 International Conference on
Print_ISBN :
978-1-4799-7731-4
DOI :
10.1109/ICTER.2014.7083901