شماره ركورد كنفرانس :
2139
عنوان مقاله :
Semantically Clustering of Persian Words
عنوان به زبان ديگر :
Semantically Clustering of Persian Words
پديدآورندگان :
Arasteh Alireza نويسنده , Elahimanesh Mohammad Hossein نويسنده , Sharif Ahmad نويسنده , Minaei Bidgoli Behrouz نويسنده
كليدواژه :
Word clustering , Text Mining , Persian NLP , Graph-base Clustering
عنوان كنفرانس :
نخستين كنفرانس بين المللي پردازش خط و زبان فارسي
چكيده لاتين :
Clustering is one of data mining task which aims to divides a set of objects into groups so that similar objects fall into the same group and objects with different features are put into different and separate groups. This paper presents a technique for semantic word clustering which is one of the applications of data mining techniques in the task of natural language processing. Word clustering is used in various fields of text mining such as word disambiguation, information retrieval, language modeling, and text classification. This paper proposes a graph based method to clustering Persian words. The proposed method is a type of pattern-based clustering. This method includes two parts; in the first part using statistical similarity measures such as Chi-Square, point wise mutual information (PMI), and Cosine a word co-occurrence graph is obtained. In the second part, the graph is further divided into appropriate clusters by Newmanʹs graph clustering algorithm. Our researches show that Chi-square is the best measure to cluster the words in Persian.
شماره مدرك كنفرانس :
4474716