Title of article :
A Hidden Markov Model for Morphology of Compound Roles in Persian Text Part of Tagging
Author/Authors :
Rezaei, H Department of Computer Engineering - Sari Branch - Islamic Azad University - Sari, Iran , Motameni, H Department of Computer Engineering - Sari Branch - Islamic Azad University - Sari, Iran , Barzegar, B Department of Computer Engineering - Babol Branch - Islamic Azad University - Babol, Iran
Abstract :
Nowadays, data mining has become significant given the popularity of social networks as well as the emergence of abbreviated words, foreign terms and emoticons in Persian language. Meanwhile, numerous studies have been conducted to identify the type of words. Identifying the role of each word in a sentence is far more important than identifying the type of word in the sentence. Meanwhile, the spelling-grammatical similarity of Persian to Arabic has enabled the newly proposed method in this paper to be applied to Arabic. In this paper, we adopted the Hidden Markov Model (HMM) and Tri-gram tagging with the aim of identifying the morphology of composition roles in Persian sentences. Then, a comparison was made between the technique developed in this paper and the Hidden Markov Model, Uni-gram and Bi-gram tagging. The proposed method supports the results obtained by the word role identification through "independent" and "dependent" roles and several factors that have a contribution to the words roles in sentences. In fact, the simulation results show that the average success rates of independent composition roles with HMM and Tri-gram tagging were 20.56% and 17.67% compared to Uni-gram and Bi-gram methods, respectively. Regarding the dependent composition role, there were improvements by 24.67% and 32.62%, respectively.
Farsi abstract :
اﻣﺮوزه ﺑﻪ دﻟﯿﻞ ﻣﺤﺒﻮﺑﯿﺖ ﺷﺒﮑﻪ ﻫﺎي اﺟﺘﻤﺎﻋﯽ و ﻧﯿ ﺰ ورود ﮐﻠﻤﺎت ﻣﺨﺘﺼﺮ، ﮐﻠﻤﺎت ﺧﺎرﺟﯽ و ﺷﮑﻠﮏ ﻫﺎ در زﺑﺎن ﻓﺎرﺳراﺑﻄﻪ ﺑﺎ ﺷﻨﺎﺳﺎﯾﯽ ﻧﻮع ﮐﻠﻤﺎت اﻧﺠﺎم ﺷﺪه اﺳﺖ. اﯾﻦ در ﺣﺎﻟﯽ اﺳﺖ ﮐﻪ ﺗﺸﺨﯿﺺ ﻧﻘﺶ ﮐﻠﻤﻪ در ﺟﻤﻠﻪ ﻣﻬﻤﺘﺮ از ﺗﺸﺨﯿﺺ ﻧﻮع ﮐﻠﻤﻪ در ﺟﻤﻠﻪ ﻣﯽ ﺑﺎﺷﺪ. از ﻃﺮﻓﯽ ﺗﺸﺎﺑﻪ اﻣﻼﯾﯽ دﺳﺘﻮري زﺑﺎن ﻓﺎرﺳﯽ ﺑﻪ زﺑﺎن ﻋﺮﺑﯽ، ﻣﻮﺟﺐ ﺷﺪه ﺗﺎ ﺑﺘﻮان از روش ﭘ ﯿﺸﻨﻬﺎدي اﯾﻦ ﻣﻘﺎﻟﻪ در زﺑﺎن ﻋﺮﺑﯽ ﻧﯿﺰ اﺳﺘﻔﺎده ﮐﺮد. در اﯾﻦ ﻣﻘﺎﻟﻪ، ﺟﻬﺖ واژه ﺷﻨﺎﺳﯽ ﻧﻘﺶ ﻫﺎي ﺗﺮﮐﯿﺐ ﺟﻤﻼت زﺑﺎن ﻓﺎرﺳﯽ از روش آﻣﺎري ﻣﺪل ﻣﺨﻔﯽ ﻣﺎرﮐﻮف و ﺑﺮﭼﺴﺐ ﮔﺬاري Tri-gram اﺳﺘﻔﺎده ﺷﺪه و ﺑﺎ روش ﻣﺪل ﻣﺨﻔﯽ ﻣﺎرﮐﻮف و ﺑﺮﭼﺴﺐﮔﺬاري Uni-gram و Bi-gram ﻣﻘﺎﯾﺴﻪ ﺷﺪه اﺳﺖ. در روش ﭘﯿﺸﻨﻬﺎدي ﺑﺎ اﺳﺘﻔﺎده از دو دﺳﺘﻪ ﻧﻘﺶ ﻫﺎي ﻣﺴﺘﻘﻞ" و واﺑﺴﺘﻪ"، و ﻋﻮاﻣﻞ ﭘﺬﯾﺮش ﻧﻘﺶ ﮐﻠﻤﺎت در ﺟﻤﻼت، ﻧﺘﺎﯾﺞ ﺷﻨﺎﺳﺎﯾﯽ ﻧﻘﺶ ﮐﻠﻤﺎت را ﺑﻬﺒﻮد ﯽ، اﻫﻤﯿﺖ دادهﮐﺎوي اﻓﺰاﯾﺶ ﯾﺎﻓﺘﻪ و ﭘﮋوﻫﺶ ﻫﺎﯾﯽ در ﻣﯽﺑﺨﺸﺪ. ﺑﻪ ﻃﻮري ﮐﻪ ﻧﺘﺎﯾﺞ ﺷﺒﯿﻪ ﺳﺎزي ﻧﺸﺎن ﻣﯽ دﻫﺪ ﮐﻪ ﻣﯿﺎﻧﮕ ﯿﻦ ﻣﻮﻓﻘﯿﺖ ﻧﻘﺶ ﻫﺎي ﻣﺴﺘﻘﻞ ﺗﺮﮐﯿﺐ ﺑﺎ ﻣﺪل ﻣﺨﻔﯽ ﻣﺎرﮐﻮف و ﺑﺮﭼﺴﺐ ﮔﺬاري Tri-gram ﻧﺴﺒﺖ ﺑﻪ روش Bi-gramو Uni-gram ﺑﻪ ﺗﺮﺗﯿ ﺐ 20.56 و 17.67 درﺻﺪ و ﺑﺮاي ﻧﻘﺶ ﻫﺎي واﺑﺴﺘﻪ ﺗﺮﮐﯿﺐ ﺑﻪ ﺗﺮ ﺗﯿﺐ 24.67 و 32.62 درﺻﺪ، ﺑﻬﺒﻮد داﺷﺘﻪ اﺳﺖ.
Keywords :
Tri gram , dependent roles , morphology , Hidden Markov Model , independent roles
Journal title :
International Journal of Engineering