تبديل خودكار درخت‌بانك وابستگي فارسي به درخت‌بانك سازه‌اي

عنوان به زبان ديگر

Converting Dependency Treebank to Constituency Treebank for Persian

پديد آورندگان

پوراميني، احمد داشنگاه صنعتي سيرجان - دانشكده برق و كامپيوتر , قيومي، مسعود پژوهشگاه علوم انساني و مطالعات فرهنگي، تهران , ناصري، امينه داشنگاه صنعتي سيرجان - دانشكده برق و كامپيوتر

تعداد صفحه

از صفحه

تا صفحه

كليدواژه

پردازش زبان طبيعي , پيكره زباني , درخت‌ بانك وابستگي , درخت‌ بانك سازه‌اي

چكيده فارسي

درخت‌ بانك‌ها به‌ طور معمول به دو شكل مبتني بر ساختار وابستگي و مبتني بر ساختار سازه‌اي ايجاد مي‌شوند. هر دوي اين ساختارها در حوزه زبان‌شناسي و پردازش زبان طبيعي كاربرد دارند. هم‌‌اكنون چندين درخت‌بانك وابستگي براي زبان فارسي وجود دارد، اما درخت‌بانك‌ سازه‌‌اي با حجم بزرگ براي اين زبان وجود ندارد. در اين مقاله قصد داريم روشي را براي تبديل يك درخت‌ بانك وابستگي به معادل سازه‌اي آن، بر اساس يك الگوريتم موجود ارائه دهيم. الگوريتم مبنا با استفاده از مجموعه‌اي از قواعد تبديل، زيردرخت‌هاي سازه‌اي متناظر با يال‌هاي وابستگي را يافته و با تركيب آنها ساختار سازه‌اي نهايي را توليد مي‌كند. ما اين الگوريتم را بر روي ساختارهاي وابستگي زبان فارسي اعمال و ضمن ارائه نتايج، اصلاحاتي را در جهت بهبود كارايي آن ارائه مي‌كنيم. نشان داده مي‌شود كه پيمايش يال‌هاي وابستگي در يك جهت خاص بر روي كيفيت الگوريتم تأثيرگذار است. همين‌طور ما اصلاحاتي را در الگورتيم مربوط به تطبيق قواعد و الگوريتم اتصال زيردرخت‌ها ارائه مي‌كنيم. اين اصلاحات كارايي الگوريتم را به شكل قابل ملاحظه‌اي افزايش مي‌دهند. نتايج عملي بهبودي را به اندازه 16/48% نسبت ‌به الگوريتم مبنا نشان مي‌دهد.

چكيده لاتين

There are two major types of treebanks: dependency-based and constituency-based. Both of them have applications in natural language processing and computational linguistics. Several dependency treebanks have been developed for Persian. However, there is no available big size constituency treebank for this language. In this paper, we aim to propose an algorithm for automatic conversion of a dependency treebank to a constituency treebank for Persian. Our method is based on an existing method. However, we make modification to enhance its accuracy. The base algorithm constructs a constituency structure according to a set of conversion rules. Each rule maps a dependency relation to a constituency subtree. The constituency structure is built by combining these subtrees. We investigate the effects of the order in which dependency relations are processed on the output constituency structure. We show that the best order depends on the charactersitics of the target language. We also make modification in the algorithm for matching the conversion rules. To match a dependency relation to a conversion rule, we start with detailed infromation and if no match was found, we decrease the details and also change the method for matching. We also make modification in the algorithm used for combining the constituency subtrees. We use statistical data derived from a treebank to find a proper position for attaching a constituency subtree to the projection chain of the head. The expremental results show that these modifications provide an improvement of 16.48% in the accuracy of the conversion algorithm.

سال انتشار

1396

عنوان نشريه

پردازش علائم و داده ها

فايل PDF

7329385

عنوان نشريه

پردازش علائم و داده ها

لينک به اين مدرک

https://search.isc.ac/dl/search/defaultta.aspx?DTC=8&DC=997263