DocumentCode :
499018
Title :
Mining the hottest topics on Chinese webpage based on the improved k-means partitioning
Author :
Wang, Yu ; Xi, Ya-hui ; Wang, Liang
Author_Institution :
Key Lab. of Machine Learning & Comput. Intell., Hebei Univ., Baoding, China
Volume :
1
fYear :
2009
fDate :
12-15 July 2009
Firstpage :
255
Lastpage :
260
Abstract :
This paper presents a new method for the mining the hottest topics on Chinese Web page which is based on the improved k-means partitioning algorithm. The dictionary applied to word segmentation is reduced by deleting words is which are useless for clustering, and the dictionary tree is created to be applied to word segmentation. Then the speed of word segmentation is improved. Correspondence between words and integers is created by coding words. Then the title is expressed by integer set, and the cost of space and time for clustering is decreased largely. Determining the value of k is a shortcoming of stream data mining based on k-means. By this new method, the value of k is adjusted in clustering. Then both the accuracy and the speed are improved.
Keywords :
Internet; data mining; dictionaries; natural language processing; trees (mathematics); word processing; Chinese Web; dictionary tree; hottest topics mining; k-means partitioning; word segmentation; Clustering algorithms; Computational intelligence; Costs; Cybernetics; Data mining; Dictionaries; Internet; Machine learning; Natural languages; Space technology; Data stream; Mining the hottest topics; Word segmentation; k-means partitioning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location :
Baoding
Print_ISBN :
978-1-4244-3702-3
Electronic_ISBN :
978-1-4244-3703-0
Type :
conf
DOI :
10.1109/ICMLC.2009.5212473
Filename :
5212473
Link To Document :
بازگشت