Title :
Restoration of Arabic diacritics using dynamic programming
Author_Institution :
Univ. of Helwan, Egypt
Abstract :
Arabic script can be written with diacritics or without diacritics. In normal situation, Arabic text is written without the diacritics (e.g. Arabic newspapers). When the diacritics are present, the Arabic script provides enough information about the correct pronunciation and the meaning of the words. Assigning the correct diacritics to Arabic words is a complex task implying morphology, syntax, and semantic processing. The goal of this research is to develop an automatic system to assign diacritics to Arabic words. The presented technique is purely statistical approach and depends only on an Arabic corpus annotated with diacritics. In this paper, we present an algorithm to restore Arabic diacritics using dynamic programming approach. The possible word sequences with diacritics are assigned scores using statistical n-gram language modeling approach. Using the assigned scores, it is possible to search the most likely sequence using a dynamic programming algorithm. When case ending is ignored (i.e the diacritic mark of last letter), preliminary results on a public domain corpus show that the algorithm can lead to good results.
Keywords :
dynamic programming; natural language processing; statistical analysis; text analysis; Arabic corpus annotation; Arabic diacritics restoration; Arabic newspaper; Arabic script; Arabic text; diacritics assignment; dynamic programming algorithm; morphology; pronunciation; score assignment; semantic processing; statistical approach; statistical n-gram language modeling approach; syntax; word meaning; word sequences; Dynamic programming; Heuristic algorithms; Hidden Markov models; Mathematical model; Smoothing methods; Syntactics; Training;
Conference_Titel :
Computer Engineering & Systems (ICCES), 2013 8th International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4799-0078-7
DOI :
10.1109/ICCES.2013.6707161