DocumentCode :
3585820
Title :
Sentence similarity measuring by vector space model
Author :
Gunasinghe, U.L.D.N. ; De Silva, W.A.M. ; de Silva, N.H.N.D. ; Perera, A.S. ; Sashika, W.A.D. ; Premasiri, W.D.T.P.
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. of Moratuwa Moratuwa, Moratuwa, Sri Lanka
fYear :
2014
Firstpage :
185
Lastpage :
189
Abstract :
In Natural Language Processing and Text mining related works, one of the important aspects is measuring the sentence similarity. When measuring the similarity between sentences there are three major branches which can be followed. One procedure is measuring the similarity based on the semantic structure of sentences while the other procedures are based on syntactic similarity measure and hybrid measures. Syntactic similarity based methods take into account the co-occurring words in strings. Semantic similarity measures consider the semantic similarity between words based on a Semantic Net. In most of the time, easiest way to calculate the sentence similarity is using the syntactic measures, which do not consider grammatical structure of sentences. There are sentences which have the same meaning with different words. By considering both semantic and syntactic similarity we can improve the quality of the similarity measure rather than depending only on semantic or syntactic similarity. This paper follows the sentence similarity measure algorithm which is developed based on both syntactic and semantic similarity measures. This algorithm is based on measuring the sentence similarity by adhering to a vector space model generated for the word nodes in the sentences. In this implementation we consider two types of relationships. One of them is relationship between verbs in the sentence pairs while the other one is the relationship between nouns in the sentence pairs. One of the major advantages of this method is, it can be used for variable length sentences. In the experiment and results section we have been included our gain with this algorithm for a selected set of sentence pairs and have been compared with the actual human ratings for the similarity of the sentence pairs.
Keywords :
data mining; natural language processing; text analysis; vectors; hybrid measures; natural language processing; semantic net; semantic structure; sentence similarity measurement; syntactic similarity measure; text mining related works; vector space model; Manganese; Semantic Similarity; Sentence Similarity; StanfordCoreNLP; Syntactic Similarity; Word Similarity; WordNet;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advances in ICT for Emerging Regions (ICTer), 2014 International Conference on
Print_ISBN :
978-1-4799-7731-4
Type :
conf
DOI :
10.1109/ICTER.2014.7083899
Filename :
7083899
Link To Document :
بازگشت