DocumentCode :
1954072
Title :
Urdu Noun Phrase Chunking - Hybrid Approach
Author :
Siddiq, Shahid ; Hussain, Sarmad ; Ali, Aasim ; Malik, Kamran ; Ali, Wajid
Author_Institution :
FAST, NUCES, Lahore, Pakistan
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
69
Lastpage :
72
Abstract :
In this work, chunking is used to mark the noun phrases of Urdu sentences. The approach used in this work is hybrid that combines statistical method and hand crafted rules. The statistical model used in this work is HMM along with IOB chunk annotation. From a POS tagged corpus of 100,000 words, around 90,000 word tokens are used for training and 10,000 word tokens for testing. Several experiments are conducted to achieve high accuracy with different combinations of input, output and rule application patterns. Overall accuracy of 97.52% is achieved using TnT Tagger. It is observed that the input sequence which is successful in this regard is merging of POS annotation with IOB annotation.
Keywords :
natural language processing; word processing; POS annotation; Urdu sentence; hand crafted rule; noun phrase chunking; statistical method; Accuracy; Computational modeling; Hidden Markov models; Probabilistic logic; Tagging; Testing; Training; Accuracy; Hybrid Approach; Noun Phrase Chunking; Part of Speech; Precision; Recall;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.71
Filename :
5681546
Link To Document :
بازگشت