DocumentCode :
2916481
Title :
Semantic similarity measure in biomedical domain leverage Web Search Engine
Author :
Chen, Chi-Huang ; Hsieh, Sheau-Ling ; Weng, Yung-Ching ; Chang, Wen-Yung ; Lai, Feipei
Author_Institution :
Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear :
2010
fDate :
Aug. 31 2010-Sept. 4 2010
Firstpage :
4436
Lastpage :
4439
Abstract :
Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.
Keywords :
Internet; medical information systems; natural language processing; search engines; semantic Web; semantic networks; support vector machines; Web search engine; biomedical domains; correlation coefficients; information retrieval; lexico-syntactic patterns; natural language processing; page counts; page-count-based semantic similarity; similarity scores; support vector machines; Biomedical measurements; Correlation; Kernel; Medical services; Semantics; Support vector machine classification; Training; Data Mining; Electronic Health Records; Health Records, Personal; Internet; Natural Language Processing; Pattern Recognition, Automated; Semantics;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE
Conference_Location :
Buenos Aires
ISSN :
1557-170X
Print_ISBN :
978-1-4244-4123-5
Type :
conf
DOI :
10.1109/IEMBS.2010.5626008
Filename :
5626008
Link To Document :
بازگشت