DocumentCode :
1659954
Title :
Simultaneous categorization of text documents and identification of cluster-dependent keywords
Author :
Frigui, Hichem ; Nasraoui, Olfa
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Memphis, TN, USA
Volume :
2
fYear :
2002
fDate :
6/24/1905 12:00:00 AM
Firstpage :
1108
Lastpage :
1113
Abstract :
We propose an approach to clustering text documents based on a coupled process of clustering and cluster-dependent keyword weighting. The proposed approach is based on the the fuzzy c-means clustering algorithm. Hence it is computationally and implementationally simple. Moreover, it learns a different set of keyword weights for each cluster. This means that, as a by-product of the clustering process, each document cluster will be characterized by a possibly different set of keywords. The cluster dependent keyword weights help in partitioning the document collection into more meaningful categories. They can also be used to automatically generate a brief summary of each cluster in terms of not only the attribute values, but also their relevance. For the case of text data, this approach can be used to automatically annotate the documents. We illustrate the performance of the proposed algorithm by using it to cluster a real collection of text documents
Keywords :
fuzzy set theory; pattern clustering; text analysis; unsupervised learning; cluster-dependent keywords identification; clustering; document cluster; document collection; fuzzy c-means clustering algorithm; text documents categorization; Clustering algorithms; Data mining; Degradation; Frequency; Information retrieval; Nearest neighbor searches; Pattern recognition; Search engines; Text mining; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
0-7803-7280-8
Type :
conf
DOI :
10.1109/FUZZ.2002.1006659
Filename :
1006659
Link To Document :
بازگشت