مرکز منطقه ای اطلاع رساني علوم و فناوري - Efficient search engine approach for measuring similarity between words: Using page count and snippets

DocumentCode :

3768203

Title :

Efficient search engine approach for measuring similarity between words: Using page count and snippets

Author :

P. Murugesan;K. Malathi

Author_Institution :

PG student computer science and engineering, Indian Institute of Information Technology, Srirangam Tiruchirappalli

fYear :

2015

Firstpage :

Lastpage :

Abstract :

Web mining involve activities such as document clustering, community mining etc., to be performed on web. Such tasks need measuring semantic similarity between word. This helps in performing web mining activities easily in many applications. The accurate measures of semantic similarity between any two words is the difficult task. A new approach to measure similarity between words is based on text snippets and page count. These two measures are taken from the results of a search engine like Google. The lexical patterns are extracted from text snippets and word co-occurrence measures are defined using page count. The results of these two are combined. Moreover, the pattern clustering and pattern extraction algorithm are used to find various relationships between any two given words. Support Vector Machines is used to optimize the result. The empirical results reveal that the techniques are finding the best results that can be compared with human ratings and accuracy in web mining activity. Semantic similarity refers to the concept by which a set of document or words within the document are assigned a weight based on their meaning. The accurate measurement of such similarity plays an important role in Natural language Processing.

Keywords :

"Semantics","Search engines","Engines","Web search","Pattern clustering","Clustering algorithms","Mutual information"

Publisher :

ieee

Conference_Titel :

Green Engineering and Technologies (IC-GET), 2015 Online International Conference on

Type :

conf

DOI :

10.1109/GET.2015.7453830

Filename :

7453830

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3768203