• DocumentCode
    3695943
  • Title

    Word Clustering Algorithms Based on Word Similarity

  • Author

    Lichi Yuan

  • Author_Institution
    Sch. of Inf. Technol., Jiangxi Univ. of Finance &
  • Volume
    1
  • fYear
    2015
  • Firstpage
    21
  • Lastpage
    24
  • Abstract
    Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.
  • Keywords
    "Clustering algorithms","Data models","Predictive models","Adaptation models","Semantics","Algorithm design and analysis","Probabilistic logic"
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2015 7th International Conference on
  • Print_ISBN
    978-1-4799-8645-3
  • Type

    conf

  • DOI
    10.1109/IHMSC.2015.36
  • Filename
    7334642