مرکز منطقه ای اطلاع رساني علوم و فناوري - Word Clustering Algorithms Based on Word Similarity

DocumentCode :

3695943

Title :

Word Clustering Algorithms Based on Word Similarity

Author :

Lichi Yuan

Author_Institution :

Sch. of Inf. Technol., Jiangxi Univ. of Finance &

Volume :

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Category-based statistical language model is an important method to solve the problem of sparse data, but there are two bottlenecks about this model: (1) the problem of word clustering, it is hard to find a suitable clustering method that has good performance and has not large amount of computation. (2) class-based method always loses some prediction ability to adapt the text of different domain. In order to solve above problems, a definition of word similarity by utilizing mutual information is presented. Based on word similarity, the definition of word set similarity is given. Experiments show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance, the perplexity is reduced from 283 to 218.

Keywords :

"Clustering algorithms","Data models","Predictive models","Adaptation models","Semantics","Algorithm design and analysis","Probabilistic logic"

Publisher :

ieee

Conference_Titel :

Intelligent Human-Machine Systems and Cybernetics (IHMSC), 2015 7th International Conference on

Print_ISBN :

978-1-4799-8645-3

Type :

conf

DOI :

10.1109/IHMSC.2015.36

Filename :

7334642

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3695943