• DocumentCode
    499018
  • Title

    Mining the hottest topics on Chinese webpage based on the improved k-means partitioning

  • Author

    Wang, Yu ; Xi, Ya-hui ; Wang, Liang

  • Author_Institution
    Key Lab. of Machine Learning & Comput. Intell., Hebei Univ., Baoding, China
  • Volume
    1
  • fYear
    2009
  • fDate
    12-15 July 2009
  • Firstpage
    255
  • Lastpage
    260
  • Abstract
    This paper presents a new method for the mining the hottest topics on Chinese Web page which is based on the improved k-means partitioning algorithm. The dictionary applied to word segmentation is reduced by deleting words is which are useless for clustering, and the dictionary tree is created to be applied to word segmentation. Then the speed of word segmentation is improved. Correspondence between words and integers is created by coding words. Then the title is expressed by integer set, and the cost of space and time for clustering is decreased largely. Determining the value of k is a shortcoming of stream data mining based on k-means. By this new method, the value of k is adjusted in clustering. Then both the accuracy and the speed are improved.
  • Keywords
    Internet; data mining; dictionaries; natural language processing; trees (mathematics); word processing; Chinese Web; dictionary tree; hottest topics mining; k-means partitioning; word segmentation; Clustering algorithms; Computational intelligence; Costs; Cybernetics; Data mining; Dictionaries; Internet; Machine learning; Natural languages; Space technology; Data stream; Mining the hottest topics; Word segmentation; k-means partitioning;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2009 International Conference on
  • Conference_Location
    Baoding
  • Print_ISBN
    978-1-4244-3702-3
  • Electronic_ISBN
    978-1-4244-3703-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2009.5212473
  • Filename
    5212473