مرکز منطقه ای اطلاع رساني علوم و فناوري - Sentence similarity based on semantic nets and corpus statistics

DocumentCode :

984438

Title :

Sentence similarity based on semantic nets and corpus statistics

Author :

Li, Yuhua ; Mclean, David ; Bandar, Zuhair A. ; O´Shea, James D. ; Crockett, Keeley

Author_Institution :

Sch. of Comput. & Intelligent Syst., Ulster Univ., Londonderry

Volume :

Issue :

fYear :

2006

Firstpage :

1138

Lastpage :

1150

Abstract :

Sentence similarity measures play an increasingly important role in text-related research and applications in areas such as text mining, Web page retrieval, and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high-dimensional space and are consequently inefficient, require human input, and are not adaptable to some application domains. This paper focuses directly on computing the similarity between very short texts of sentence length. It presents an algorithm that takes account of semantic information and word order information implied in the sentences. The semantic similarity of two sentences is calculated using information from a structured lexical database and from corpus statistics. The use of a lexical database enables our method to model human common sense knowledge and the incorporation of corpus statistics allows our method to be adaptable to different domains. The proposed method can be used in a variety of applications that involve text knowledge representation and discovery. Experiments on two sets of selected sentence pairs demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition

Keywords :

computational linguistics; data mining; database management systems; knowledge representation; text analysis; Web page retrieval; corpus statistics; dialogue system; semantic nets; sentence similarity computing; structured lexical database; text knowledge discovery; text knowledge representation; text mining; Area measurement; Databases; Humans; Image retrieval; Knowledge representation; Natural language processing; Natural languages; Statistics; Text mining; Web pages; Sentence similarity; corpus; natural language processing; semantic nets; word similarity.;

fLanguage :

English

Journal_Title :

Knowledge and Data Engineering, IEEE Transactions on

Publisher :

ieee

ISSN :

1041-4347

Type :

jour

DOI :

10.1109/TKDE.2006.130

Filename :

1644735

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=984438