Title :
Simultaneous categorization of text documents and identification of cluster-dependent keywords
Author :
Frigui, Hichem ; Nasraoui, Olfa
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Memphis, TN, USA
fDate :
6/24/1905 12:00:00 AM
Abstract :
We propose an approach to clustering text documents based on a coupled process of clustering and cluster-dependent keyword weighting. The proposed approach is based on the the fuzzy c-means clustering algorithm. Hence it is computationally and implementationally simple. Moreover, it learns a different set of keyword weights for each cluster. This means that, as a by-product of the clustering process, each document cluster will be characterized by a possibly different set of keywords. The cluster dependent keyword weights help in partitioning the document collection into more meaningful categories. They can also be used to automatically generate a brief summary of each cluster in terms of not only the attribute values, but also their relevance. For the case of text data, this approach can be used to automatically annotate the documents. We illustrate the performance of the proposed algorithm by using it to cluster a real collection of text documents
Keywords :
fuzzy set theory; pattern clustering; text analysis; unsupervised learning; cluster-dependent keywords identification; clustering; document cluster; document collection; fuzzy c-means clustering algorithm; text documents categorization; Clustering algorithms; Data mining; Degradation; Frequency; Information retrieval; Nearest neighbor searches; Pattern recognition; Search engines; Text mining; Web sites;
Conference_Titel :
Fuzzy Systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
0-7803-7280-8
DOI :
10.1109/FUZZ.2002.1006659