Title :
A Dependency Treebank of the Quran using traditional Arabic grammar
Author :
Dukes, Kais ; Buckwalter, Tim
Author_Institution :
Sch. of Comput., Univ. of Leeds, Leeds, UK
Abstract :
The Quran is a significant religious text, followed by the 1.5 billion believers of the Islamic faith worldwide. The text dates to 610-632 CE and is written in Quranic Arabic, the direct ancestor language of modern standard Arabic in use today. This paper presents the Quranic Arabic Dependency Treebank (QADT) and reports on the approaches and solutions used to apply Natural Language Processing to the unique and challenging language of the Quran. This project differs from other Arabic treebanks by providing a deep computational linguistic model based on historical traditional Arabic grammar. The treebank is part of the Quranic Arabic Corpus (http://corpus.quran.com), a popular free Arabic resource developed at the University of Leeds. Motivated by the importance of the Quran as a central religious text, we also report on how online collaborative annotation was used to bring together Quranic scholars and Arabic language experts to ensure a high level of accuracy for grammatical analysis of the entire Quran.
Keywords :
grammars; natural language processing; Arabic grammar; Quran dependency treebank; grammatical analysis; natural language processing; Computational linguistics; Computational modeling; Educational institutions; Morphology; Natural language processing; Online Communities/Technical Collaboration; Performance analysis; Spatial databases; Tagging; Tree graphs; Arabic; Corpus; Dependency Grammar; Morphology; Part-of-Speech Tagging; Quran; Treebank Syntax;
Conference_Titel :
Informatics and Systems (INFOS), 2010 The 7th International Conference on
Conference_Location :
Cairo
Print_ISBN :
978-1-4244-5828-8