DocumentCode
3695943
Title
Word Clustering Algorithms Based on Word Similarity
Author
Lichi Yuan
Author_Institution
Sch. of Inf. Technol., Jiangxi Univ. of Finance &
Volume
1
fYear
2015
Firstpage
21
Lastpage
24
Abstract
Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.
Keywords
"Clustering algorithms","Data models","Predictive models","Adaptation models","Semantics","Algorithm design and analysis","Probabilistic logic"
Publisher
ieee
Conference_Titel
Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2015 7th International Conference on
Print_ISBN
978-1-4799-8645-3
Type
conf
DOI
10.1109/IHMSC.2015.36
Filename
7334642
Link To Document