مرکز منطقه ای اطلاع رساني علوم و فناوري - Creating a Phrase Similarity Graph from Wikipedia

DocumentCode :

1787408

Title :

Creating a Phrase Similarity Graph from Wikipedia

Author :

Stanchev, Lubomir

Author_Institution :

Comput. Sci. Dept., Indiana Univ.-Purdue Univ. Fort Wayne, Fort Wayne, IN, USA

fYear :

2014

fDate :

16-18 June 2014

Firstpage :

Lastpage :

Abstract :

The paper addresses the problem of modeling the relationship between phrases in English using a similarity graph. The mathematical model stores data about the strength of the relationship between phrases expressed as a decimal number. Both structured data from Wikipedia, such as that the Wikipedia page with title "Dog" belongs to the Wikipedia category "Domesticated animals", and textual descriptions, such as that the Wikipedia page with title "Dog" contains the word "wolf" thirty one times are used in creating the graph. The quality of the graph data is validated by comparing the similarity of pairs of phrases using our software that uses the graph with results of studies that were performed with human subjects. To the best of our knowledge, our software produces better correlation with the results of both the Miller and Charles study and the WordSimilarity-353 study than any other published research.

Keywords :

data structures; graph theory; natural language processing; storage management; Wikipedia category domesticated animals; Wikipedia page; WordSimilarity-353 study; data structure; decimal number; dog; graph data quality; mathematical model; phrase similarity graph; textual descriptions; wolf; Computers; Electronic publishing; Encyclopedias; Ice; Internet; Semantics; semantic search; similarity graph;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Semantic Computing (ICSC), 2014 IEEE International Conference on

Conference_Location :

Newport Beach, CA

Print_ISBN :

978-1-4799-4002-8

Type :

conf

DOI :

10.1109/ICSC.2014.22

Filename :

6882003

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1787408