Title :
A Document Clustering Method Based on Hierarchical Algorithm with Model Clustering
Author :
Sun, Haojun ; Liu, Zhihui ; Kong, Lingjun
Author_Institution :
Shantou Univ., Shantou
Abstract :
Document clustering is an important tool for text analysis and is used in many applications. This work develops a novel hierarchal algorithm for document clustering. We are particularly interested in studying and making use of cluster overlapping phenomenon to design cluster merging criteria. In our previous papers, the theoretical results on the overlap rate between clusters based on the Gaussian mixture model were reported. In this paper, we propose a new way to compute the overlap rate in order to improve time efficiency and "the veracity". The way is that we use a line passed through the two cluster\´s center instead of the ridge curve. Based on the hierarchical clustering method, we use the expectation-maximization (EM) algorithm in the Gaussian mixture model to count the parameters and make the two sub-clusters combined when their overlap is the largest. Experiments in both public data and document clustering data show that this approach can improve the efficiency of clustering and save computing time.
Keywords :
Gaussian processes; expectation-maximisation algorithm; text analysis; Gaussian mixture model; document clustering method; expectation-maximization algorithm; hierarchical clustering method; model clustering; ridge curve; text analysis; Application software; Clustering algorithms; Clustering methods; Computer networks; Electronic mail; Frequency; Gaussian distribution; Mathematical model; Mathematics; Partitioning algorithms;
Conference_Titel :
Advanced Information Networking and Applications - Workshops, 2008. AINAW 2008. 22nd International Conference on
Conference_Location :
Okinawa
Print_ISBN :
978-0-7695-3096-3
DOI :
10.1109/WAINA.2008.45