Title :
Pattern-based algorithm for Part-of-Speech tagging Arabic text
Author :
Alqrainy, Shihadeh ; AlSerhan, Hasan Muaidi ; Ayesh, Aladdin
Author_Institution :
Prince Abdullah Bin Ghazi Fac. of Sci. & Inf. Technol., AlBalqa Appl. Univ., Amman
Abstract :
Building a generic Part-of-Speech (POS) tagger system without a lexicon (dictionary) depends on the language and the characteristics of its grammar, both the morphological and the syntactical systems of that language. Arabic language has a valuable and important feature, called diacritics, which are marks placed over and below the letters of Arabic word. This paper presents a novel algorithm to assign the correct POS tag to those words belonging to a verb or a noun class in an Arabic text. The algorithm is based on the pattern (wazn) of the word instead of using a huge manually tagged lexicon from which large amounts of training data can be extracted. An experiment was ran on a data set that contains 5,000 words belonging to a noun and a verb class to evaluate the accuracy of the algorithm. The algorithm is achieved an accuracy of 91%.
Keywords :
natural language processing; text analysis; data set; diacritics; noun class; part-of-speech tagging Arabic text; pattern-based algorithm; Data mining; Dictionaries; Information technology; Labeling; Radio access networks; Speech recognition; Speech synthesis; Tagging; Testing; Training data; Arabic Language; Diacritics; Morphological; Part-Of-Speech(POS); Syntactical; Tag set;
Conference_Titel :
Computer Engineering & Systems, 2008. ICCES 2008. International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-2115-2
Electronic_ISBN :
978-1-4244-2116-9
DOI :
10.1109/ICCES.2008.4772979