Title :
An Improved Hierarchical K-Means Algorithm for Web Document Clustering
Author :
Liu, Yongxin ; Liu, Zhijng
Author_Institution :
Sch. of Comput. Sci. & Technol., Xidian Univ., Xian
fDate :
Aug. 29 2008-Sept. 2 2008
Abstract :
In order to conquer the major challenges of current Web document clustering, i.e. huge volume of documents, high dimensional process, we proposed a simple agglomerative hierarchical k-means clustering (SAHKC) algorithm based on H-K (hierarchical k-means) algorithm, and a new model was used in this paper to describe the Web document, named as multiple feature vector space model (MFVSM). Experimental results indicate that: the MFVSM is helpful in improving the quality of clustering result, and compare with the H-K algorithm, the SAHKC algorithmpsilas running time reduce nearly 30%, however, the average precision of clustering result only reduce about 10%.
Keywords :
Internet; document handling; pattern clustering; Web document clustering; multiple feature vector space model; simple agglomerative hierarchical k-means clustering; Clustering algorithms; Clustering methods; Computer science; Data mining; Databases; Greedy algorithms; HTML; Information technology; Partitioning algorithms; Web mining; K-Means; vector space time (VSM); web document clustering;
Conference_Titel :
Computer Science and Information Technology, 2008. ICCSIT '08. International Conference on
Conference_Location :
Singapore
Print_ISBN :
978-0-7695-3308-7
DOI :
10.1109/ICCSIT.2008.152