Title :
A Combination-based Semantic Similarity Measure using Multiple Information Sources
Author :
Nguyen, Hoa A. ; Al-Mubaid, Hisham
Author_Institution :
Dept. of Comput. Sci., Houston Univ., TX
Abstract :
The semantic similarity techniques are interested in determining how much two concepts, or terms, are similar according to a given ontology. This paper proposes a method for measuring semantic similarity/distance between terms. The measure combines strengths and complements weaknesses of existing measures that use ontology as primary source. The proposed measure uses a new feature of common specificity (CSpec) besides the path length feature. The CSpec feature is derived from (1) information content of concepts, and (2) information content of the ontology given a corpus. We evaluated the proposed measure with benchmark test set of term pairs scored for similarity by human experts. The experimental results demonstrated that our similarity measure is effective and outperforms the existing measures. The proposed semantic similarity measure gives the best correlation (0.874) with human scores in the benchmark test set compared to the existing measures
Keywords :
formal languages; information analysis; ontologies (artificial intelligence); combination-based semantic similarity measure; common specificity; concept information content; multiple information source; ontology information content; path length feature; term semantic distance measure; Benchmark testing; Biomedical measurements; Computer science; Humans; Information retrieval; Lakes; Length measurement; Natural languages; Ontologies; Probability; Semantic similarity; information retrieval; natural language semantics; ontology;
Conference_Titel :
Information Reuse and Integration, 2006 IEEE International Conference on
Conference_Location :
Waikoloa Village, HI
Print_ISBN :
0-7803-9788-6
DOI :
10.1109/IRI.2006.252484