DocumentCode
2916481
Title
Semantic similarity measure in biomedical domain leverage Web Search Engine
Author
Chen, Chi-Huang ; Hsieh, Sheau-Ling ; Weng, Yung-Ching ; Chang, Wen-Yung ; Lai, Feipei
Author_Institution
Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
fYear
2010
fDate
Aug. 31 2010-Sept. 4 2010
Firstpage
4436
Lastpage
4439
Abstract
Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.
Keywords
Internet; medical information systems; natural language processing; search engines; semantic Web; semantic networks; support vector machines; Web search engine; biomedical domains; correlation coefficients; information retrieval; lexico-syntactic patterns; natural language processing; page counts; page-count-based semantic similarity; similarity scores; support vector machines; Biomedical measurements; Correlation; Kernel; Medical services; Semantics; Support vector machine classification; Training; Data Mining; Electronic Health Records; Health Records, Personal; Internet; Natural Language Processing; Pattern Recognition, Automated; Semantics;
fLanguage
English
Publisher
ieee
Conference_Titel
Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE
Conference_Location
Buenos Aires
ISSN
1557-170X
Print_ISBN
978-1-4244-4123-5
Type
conf
DOI
10.1109/IEMBS.2010.5626008
Filename
5626008
Link To Document