DocumentCode
3221666
Title
A Document Clustering Method Based on Hierarchical Algorithm with Model Clustering
Author
Sun, Haojun ; Liu, Zhihui ; Kong, Lingjun
Author_Institution
Shantou Univ., Shantou
fYear
2008
fDate
25-28 March 2008
Firstpage
1229
Lastpage
1233
Abstract
Document clustering is an important tool for text analysis and is used in many applications. This work develops a novel hierarchal algorithm for document clustering. We are particularly interested in studying and making use of cluster overlapping phenomenon to design cluster merging criteria. In our previous papers, the theoretical results on the overlap rate between clusters based on the Gaussian mixture model were reported. In this paper, we propose a new way to compute the overlap rate in order to improve time efficiency and "the veracity". The way is that we use a line passed through the two cluster\´s center instead of the ridge curve. Based on the hierarchical clustering method, we use the expectation-maximization (EM) algorithm in the Gaussian mixture model to count the parameters and make the two sub-clusters combined when their overlap is the largest. Experiments in both public data and document clustering data show that this approach can improve the efficiency of clustering and save computing time.
Keywords
Gaussian processes; expectation-maximisation algorithm; text analysis; Gaussian mixture model; document clustering method; expectation-maximization algorithm; hierarchical clustering method; model clustering; ridge curve; text analysis; Application software; Clustering algorithms; Clustering methods; Computer networks; Electronic mail; Frequency; Gaussian distribution; Mathematical model; Mathematics; Partitioning algorithms;
fLanguage
English
Publisher
ieee
Conference_Titel
Advanced Information Networking and Applications - Workshops, 2008. AINAW 2008. 22nd International Conference on
Conference_Location
Okinawa
Print_ISBN
978-0-7695-3096-3
Type
conf
DOI
10.1109/WAINA.2008.45
Filename
4483087
Link To Document