Title of article :
Feature Engineering in Persian Dependency Parser
Author/Authors :
Ebrahimpour-Komleh, H Department of Computer Eng University of Kashan - Kashan, Iran , Lazemi, S Department of Computer Eng University of Kashan - Kashan, Iran
Abstract :
Dependency parser is one of the most important fundamental tools in the natural language processing, which extracts the structure of sentences and determines the relations between words based on the grammar dependency. The dependency parser is proper for free-order languages such as Persian. In this work, data-driven dependency parser is developed with the help of phrase-structure parser for Persian. The defined feature space in each parser is one of the important factors involved in its success. Our goal is to generate and extract appropriate features to dependency parsing of Persian sentences. In order to achieve this goal, new semantic and syntactic features are defined and added to the MSTParser by the stacking method. Semantic features are obtained using word clustering algorithms based on syntagmatic analysis, and the syntactic features are obtained using the Persian phrase-structure parser, and are used as bit-string. Experiments are conducted on the Persian Dependency Treebank (PerDT) and the Uppsala Persian Dependency Treebank (UPDT). The results obtained indicate that the definition of new features improves the performance of the dependency parser for Persian. The achieved unlabeled attachment scores for PerDT and UPDT are 89.17% and 88.96%, respectively.
Keywords :
Stacking Persian , MSTParser , Phrase-structure parser , Dependency Parser
Journal title :
Astroparticle Physics