Title :
Weighted k-Means Algorithm Based Text Clustering
Author :
Chen, Xiuguo ; Yin, Wensheng ; Tu, Pinghui ; Zhang, Hengxi
Author_Institution :
Sch. of Mech. Sci. & Eng., Huazhong Univ. of Sci. & Technol., Wuhan, China
Abstract :
This paper proposes a weighted k-means clustering algorithm based on k-means (MacQueen, 1967; Anderberg, 1973) algorithm, and it can be used to cluster texts. Firstly, the weighted k-means algorithm changes the descriptive approach of text objects, and converts the categorical attributes to numeric ones to measure the dissimilarity of text objects by Euclidean distance; then, the weighted k-means algorithm uses weight vector to decrease the affects of irrelevant attributes and reflect the semantic information of text objects. Through an experiment, the weighted k-means algorithm is demonstrated to be more effective than k-means algorithm when used to cluster texts.
Keywords :
data mining; geometry; pattern clustering; text analysis; Euclidean distance; descriptive approach; semantic information; text clustering; text objects; weighted k-means algorithm; Clustering algorithms; Clustering methods; Data mining; Electronic commerce; Electronic mail; Euclidean distance; Paper technology; Partitioning algorithms; Text mining; k-means clustering; text clustering; text mining; weighting;
Conference_Titel :
Information Engineering and Electronic Commerce, 2009. IEEC '09. International Symposium on
Conference_Location :
Ternopil
Print_ISBN :
978-0-7695-3686-6
DOI :
10.1109/IEEC.2009.17