Title :
Word Clustering Algorithms Based on Word Similarity
Author_Institution :
Sch. of Inf. Technol., Jiangxi Univ. of Finance &
Abstract :
Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.
Keywords :
"Clustering algorithms","Data models","Predictive models","Adaptation models","Semantics","Algorithm design and analysis","Probabilistic logic"
Conference_Titel :
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2015 7th International Conference on
Print_ISBN :
978-1-4799-8645-3
DOI :
10.1109/IHMSC.2015.36