DocumentCode
499018
Title
Mining the hottest topics on Chinese webpage based on the improved k-means partitioning
Author
Wang, Yu ; Xi, Ya-hui ; Wang, Liang
Author_Institution
Key Lab. of Machine Learning & Comput. Intell., Hebei Univ., Baoding, China
Volume
1
fYear
2009
fDate
12-15 July 2009
Firstpage
255
Lastpage
260
Abstract
This paper presents a new method for the mining the hottest topics on Chinese Web page which is based on the improved k-means partitioning algorithm. The dictionary applied to word segmentation is reduced by deleting words is which are useless for clustering, and the dictionary tree is created to be applied to word segmentation. Then the speed of word segmentation is improved. Correspondence between words and integers is created by coding words. Then the title is expressed by integer set, and the cost of space and time for clustering is decreased largely. Determining the value of k is a shortcoming of stream data mining based on k-means. By this new method, the value of k is adjusted in clustering. Then both the accuracy and the speed are improved.
Keywords
Internet; data mining; dictionaries; natural language processing; trees (mathematics); word processing; Chinese Web; dictionary tree; hottest topics mining; k-means partitioning; word segmentation; Clustering algorithms; Computational intelligence; Costs; Cybernetics; Data mining; Dictionaries; Internet; Machine learning; Natural languages; Space technology; Data stream; Mining the hottest topics; Word segmentation; k-means partitioning;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2009 International Conference on
Conference_Location
Baoding
Print_ISBN
978-1-4244-3702-3
Electronic_ISBN
978-1-4244-3703-0
Type
conf
DOI
10.1109/ICMLC.2009.5212473
Filename
5212473
Link To Document