Title :
Development of Pashto Treebank
Author :
Ali, Raian ; Khan, Muhammad Asad ; Khan, Muhammad Asad
Author_Institution :
Quaid-e-Azam Coll. of Commerce, Univ. of Peshawar, Peshawar, Pakistan
Abstract :
This paper is about the development of Pashto Treebank in the form of Extensible Markup Language (XML) code. A Chart Parser has been developed that uses Chart Parsing Algorithm for building parse trees for Pashto sentences. The output of the parser is the parsed text which can be obtained in one of its three forms such as reduced graph, parse tree and XML code. For parsing, the parser needs Context Free Grammar (CFG) of Pashto language and Tagged Input Text as input. The system has been tested on real world text taken from Pashto novels and web sites and tagged manually. Eighty seven (87) sentences were parsed by the parser in which fifty four (54) were correctly parsed with a single parse tree and the rest 33 were parsed with multiple trees and thus the accuracy obtained is 62.06%.
Keywords :
Web sites; XML; computational linguistics; context-free grammars; natural language processing; text analysis; Pashto language; Pashto novels; Pashto sentences; Pashto treebank; Web sites; XML code; chart parsing algorithm; context free grammar; extensible markup language code; reduced graph; tagged input text; Argon; Humans; Nickel; Testing; XML; Corpus; Parser; Parsing; Pashto; Treebank;
Conference_Titel :
Computer Networks and Information Technology (ICCNIT), 2011 International Conference on
Conference_Location :
Abbottabad
Print_ISBN :
978-1-61284-940-9
DOI :
10.1109/ICCNIT.2011.6020939