شماره ركورد :
440679
عنوان مقاله :
دسته بندي كلمات جهت استفاده در ساخت مدل زباني آماري براي زبان فارسي
عنوان به زبان ديگر :
Word Classification to use in Persian Class-based N-gram Language Models
پديد آورندگان :
-، - گردآورنده - Bazargani, N
اطلاعات موجودي :
دوفصلنامه سال 1386 شماره 8
رتبه نشريه :
فاقد درجه علمي
تعداد صفحه :
19
از صفحه :
37
تا صفحه :
55
كليدواژه :
دسته بندي كلمات , مدل زباني آماري از نوع دسته بندي شده , اطلاعات متقابل , سرگشتگي
چكيده لاتين :
Statistical language models (SLM) have been widely used in speech recognition systems. Among them, N-gram language model is the most popular ones. Off course, in the case of large vocabulary systems, while estimating the parameters of n-gram language models, as a result of insignificant size of the used corpus, usually the sparse data problem occurs. By assigning the words to some restricted number of classes, the size of the model parameters will be reduced and a not very large corpus could be used to reach to a class-based n-gram model. In this research, we are going to implement some known automatic word classification methods on Persian and modify them to find better classification results. The first method is known as Brown method which exploits a statistical parameter named "mutual information" to evaluate word classification result. The second method, represented by Martin, follows perplexity decrement via a displacement algorithm. The third method finds classes by using a statistical similarity parameter between words and a bottom-up algorithm. We implemented all of these methods on Persian and compared them in the area of the resulted perplexity of class-based bigrams stated on the word classification results. To modify these known methods then two new methods are introduced. In the first one, the initial point of the Brown algorithm is modified which finally leads to a smaller perplexity on test data. In the second method, a complex of the displacement algorithm and choosing a threshold level to verify classes combination is used which leads to a smaller perplexity against original Brown method in addition of finding automatically the best number of word classes, depending on the selected threshold.
سال انتشار :
1386
عنوان نشريه :
پردازش علائم و داده ها
عنوان نشريه :
پردازش علائم و داده ها
اطلاعات موجودي :
دوفصلنامه با شماره پیاپی 8 سال 1386
كلمات كليدي :
#تست#آزمون###امتحان
لينک به اين مدرک :
بازگشت