DocumentCode
738403
Title
A Large Probabilistic Semantic Network Based Approach to Compute Term Similarity
Author
Li, Peipei ; Wang, Haixun ; Zhu, Kenny Q. ; Wang, Zhongyuan ; Hu, Xuegang ; Wu, Xindong
Author_Institution
School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
Volume
27
Issue
10
fYear
2015
Firstpage
2604
Lastpage
2617
Abstract
Measuring semantic similarity between two terms is essential for a variety of text analytics and understanding applications. Currently, there are two main approaches for this task, namely the knowledge based and the corpus based approaches. However, existing approaches are more suitable for semantic similarity between words rather than the more general multi-word expressions (MWEs), and they do not scale very well. Contrary to these existing techniques, we propose an efficient and effective approach for semantic similarity using a large scale semantic network. This semantic network is automatically acquired from billions of web documents. It consists of millions of concepts, which explicitly model the context of semantic relationships. In this paper, we first show how to map two terms into the concept space, and compare their similarity there. Then, we introduce a clustering approach to orthogonalize the concept space in order to improve the accuracy of the similarity measure. Finally, we conduct extensive studies to demonstrate that our approach can accurately compute the semantic similarity between terms of MWEs and with ambiguity, and significantly outperforms 12 competing methods under Pearson Correlation Coefficient. Meanwhile, our approach is much more efficient than all competing algorithms, and can be used to compute semantic similarity in a large scale.
Keywords
Clustering algorithms; Companies; Context; Google; Knowledge based systems; Semantics; Taxonomy; Term similarity; clustering; multi-word expression; semantic network;
fLanguage
English
Journal_Title
Knowledge and Data Engineering, IEEE Transactions on
Publisher
ieee
ISSN
1041-4347
Type
jour
DOI
10.1109/TKDE.2015.2419673
Filename
7079385
Link To Document