مرکز منطقه ای اطلاع رساني علوم و فناوري

شماره ركورد كنفرانس :

2139

عنوان مقاله :

Semantically Clustering of Persian Words

عنوان به زبان ديگر :

Semantically Clustering of Persian Words

پديدآورندگان :

Arasteh Alireza نويسنده , Elahimanesh Mohammad Hossein نويسنده , Sharif Ahmad نويسنده , Minaei Bidgoli Behrouz نويسنده

تعداد صفحه :

كليدواژه :

Word clustering , Text Mining , Persian NLP , Graph-base Clustering

سال انتشار :

1391

عنوان كنفرانس :

نخستين كنفرانس بين المللي پردازش خط و زبان فارسي

زبان مدرك :

فارسی

چكيده لاتين :

Clustering is one of data mining task which aims to divides a set of objects into groups so that similar objects fall into the same group and objects with different features are put into different and separate groups. This paper presents a technique for semantic word clustering which is one of the applications of data mining techniques in the task of natural language processing. Word clustering is used in various fields of text mining such as word disambiguation, information retrieval, language modeling, and text classification. This paper proposes a graph based method to clustering Persian words. The proposed method is a type of pattern-based clustering. This method includes two parts; in the first part using statistical similarity measures such as Chi-Square, point wise mutual information (PMI), and Cosine a word co-occurrence graph is obtained. In the second part, the graph is further divided into appropriate clusters by Newmanʹs graph clustering algorithm. Our researches show that Chi-square is the best measure to cluster the words in Persian.

شماره مدرك كنفرانس :

4474716

سال انتشار :

1391

از صفحه :

تا صفحه :

سال انتشار :

1391

لينک به اين مدرک :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=36&DC=101899