Title :
Fassieh¯, a Semi-Automatic Visual Interactive Tool for Morphological, PoS-Tags, Phonetic, and Semantic Annotation of Arabic Text Corpora
Author :
Attia, Mohamed ; Rashwan, Mohsen A A ; Al-Badrashiny, Mohamed A S A A
Author_Institution :
Eng. Co. for the Dev. of Comput. Syst., Res. & Dev. Int., Giza
fDate :
7/1/2009 12:00:00 AM
Abstract :
This paper introduces an Arabic text annotation tool called Fassiehreg. Via a sophisticated interactive GUI application, Fassiehreg makes it easy to build structured large standard written Arabic corpora, then allows the production of fundamental linguistic analyses; i.e., language factorizations, at high coverage and accuracy rates over such corpora. Arabic morphological analysis, part-of-speech (PoS)-tagging, full phonetic transcription (diacritization), and lexical semantics analysis are the most significant Arabic language factorizations currently supported by Fassiehreg. The high inherent ambiguity of these analyses is statistically resolved in Fassiehreg which also affords a multitude of auxiliary features enabling a guided, normalized, and efficient proofreading of any part of the factorized corpus. The paper first reviews the highly inflective and derivative nature of Arabic language, our Arabic language factorization models, and the associated statistical disambiguation methodology. Afterwards, we present Fassiehreg which is not only a text annotation tool, but is also an evaluation, demonstrative, and tutorial means of Arabic natural language processing (NLP).
Keywords :
natural language processing; speech processing; Arabic text corpora; Fassieh; GUI application; PoS tagging; associated statistical disambiguation methodology; auxiliary features; full phonetic transcription; fundamental linguistic analyses; language factorizations; lexical semantics analysis; morphological analysis; natural language processing; part-of-speech tagging; semantic annotation; semiautomatic visual interactive tool; Computer errors; Dictionaries; Graphical user interfaces; Morphology; Natural language processing; Production; Research and development; Stochastic processes; Tagging; Text recognition; Annotation; Arabic; diacritization; interactive text annotation; language factorization; lexical analysis; lexical semantics; lexicon; maximum a posteriori (MAP); morphological analysis; morphology; natural language processing (NLP); noisy channel model; part-of-speech (PoS) tagging; phonetic transcription; phonology; search trellis; semantic analysis; statistical disambiguation; statistical language modeling (SLM); stochastic modeling; text annotation tools; written language processing;
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
DOI :
10.1109/TASL.2009.2019298