مرکز منطقه ای اطلاع رساني علوم و فناوري - Sentence similarity measuring by vector space model

DocumentCode :

3585820

Title :

Sentence similarity measuring by vector space model

Author :

Gunasinghe, U.L.D.N. ; De Silva, W.A.M. ; de Silva, N.H.N.D. ; Perera, A.S. ; Sashika, W.A.D. ; Premasiri, W.D.T.P.

Author_Institution :

Dept. of Comput. Sci. & Eng., Univ. of Moratuwa Moratuwa, Moratuwa, Sri Lanka

fYear :

2014

Firstpage :

185

Lastpage :

189

Abstract :

In Natural Language Processing and Text mining related works, one of the important aspects is measuring the sentence similarity. When measuring the similarity between sentences there are three major branches which can be followed. One procedure is measuring the similarity based on the semantic structure of sentences while the other procedures are based on syntactic similarity measure and hybrid measures. Syntactic similarity based methods take into account the co-occurring words in strings. Semantic similarity measures consider the semantic similarity between words based on a Semantic Net. In most of the time, easiest way to calculate the sentence similarity is using the syntactic measures, which do not consider grammatical structure of sentences. There are sentences which have the same meaning with different words. By considering both semantic and syntactic similarity we can improve the quality of the similarity measure rather than depending only on semantic or syntactic similarity. This paper follows the sentence similarity measure algorithm which is developed based on both syntactic and semantic similarity measures. This algorithm is based on measuring the sentence similarity by adhering to a vector space model generated for the word nodes in the sentences. In this implementation we consider two types of relationships. One of them is relationship between verbs in the sentence pairs while the other one is the relationship between nouns in the sentence pairs. One of the major advantages of this method is, it can be used for variable length sentences. In the experiment and results section we have been included our gain with this algorithm for a selected set of sentence pairs and have been compared with the actual human ratings for the similarity of the sentence pairs.

Keywords :

data mining; natural language processing; text analysis; vectors; hybrid measures; natural language processing; semantic net; semantic structure; sentence similarity measurement; syntactic similarity measure; text mining related works; vector space model; Manganese; Semantic Similarity; Sentence Similarity; StanfordCoreNLP; Syntactic Similarity; Word Similarity; WordNet;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Advances in ICT for Emerging Regions (ICTer), 2014 International Conference on

Print_ISBN :

978-1-4799-7731-4

Type :

conf

DOI :

10.1109/ICTER.2014.7083899

Filename :

7083899

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3585820