Word Clustering Algorithms Based on Word Similarity

Author

Lichi Yuan

Author_Institution

Sch. of Inf. Technol., Jiangxi Univ. of Finance &

Volume

fYear

2015

Firstpage

Lastpage

Abstract

Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.

Keywords

"Clustering algorithms","Data models","Predictive models","Adaptation models","Semantics","Algorithm design and analysis","Probabilistic logic"

Publisher

ieee

Conference_Titel

Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2015 7th International Conference on

Print_ISBN

978-1-4799-8645-3

Type

conf

DOI

10.1109/IHMSC.2015.36

Filename

7334642

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=3695943