DocumentCode :
3326416
Title :
Development of Pashto Treebank
Author :
Ali, Raian ; Khan, Muhammad Asad ; Khan, Muhammad Asad
Author_Institution :
Quaid-e-Azam Coll. of Commerce, Univ. of Peshawar, Peshawar, Pakistan
fYear :
2011
fDate :
11-13 July 2011
Firstpage :
257
Lastpage :
262
Abstract :
This paper is about the development of Pashto Treebank in the form of Extensible Markup Language (XML) code. A Chart Parser has been developed that uses Chart Parsing Algorithm for building parse trees for Pashto sentences. The output of the parser is the parsed text which can be obtained in one of its three forms such as reduced graph, parse tree and XML code. For parsing, the parser needs Context Free Grammar (CFG) of Pashto language and Tagged Input Text as input. The system has been tested on real world text taken from Pashto novels and web sites and tagged manually. Eighty seven (87) sentences were parsed by the parser in which fifty four (54) were correctly parsed with a single parse tree and the rest 33 were parsed with multiple trees and thus the accuracy obtained is 62.06%.
Keywords :
Web sites; XML; computational linguistics; context-free grammars; natural language processing; text analysis; Pashto language; Pashto novels; Pashto sentences; Pashto treebank; Web sites; XML code; chart parsing algorithm; context free grammar; extensible markup language code; reduced graph; tagged input text; Argon; Humans; Nickel; Testing; XML; Corpus; Parser; Parsing; Pashto; Treebank;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Networks and Information Technology (ICCNIT), 2011 International Conference on
Conference_Location :
Abbottabad
ISSN :
2223-6317
Print_ISBN :
978-1-61284-940-9
Type :
conf
DOI :
10.1109/ICCNIT.2011.6020939
Filename :
6020939
Link To Document :
بازگشت