• DocumentCode
    2916481
  • Title

    Semantic similarity measure in biomedical domain leverage Web Search Engine

  • Author

    Chen, Chi-Huang ; Hsieh, Sheau-Ling ; Weng, Yung-Ching ; Chang, Wen-Yung ; Lai, Feipei

  • Author_Institution
    Dept. of Electr. Eng., Nat. Taiwan Univ., Taipei, Taiwan
  • fYear
    2010
  • fDate
    Aug. 31 2010-Sept. 4 2010
  • Firstpage
    4436
  • Lastpage
    4439
  • Abstract
    Semantic similarity measure plays an essential role in Information Retrieval and Natural Language Processing. In this paper we propose a page-count-based semantic similarity measure and apply it in biomedical domains. Previous researches in semantic web related applications have deployed various semantic similarity measures. Despite the usefulness of the measurements in those applications, measuring semantic similarity between two terms remains a challenge task. The proposed method exploits page counts returned by the Web Search Engine. We define various similarity scores for two given terms P and Q, using the page counts for querying P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using lexico-syntactic patterns with page counts. These different similarity scores are integrated adapting support vector machines, to leverage the robustness of semantic similarity measures. Experimental results on two datasets achieve correlation coefficients of 0.798 on the dataset provided by A. Hliaoutakis, 0.705 on the dataset provide by T. Pedersen with physician scores and 0.496 on the dataset provided by T. Pedersen et al. with expert scores.
  • Keywords
    Internet; medical information systems; natural language processing; search engines; semantic Web; semantic networks; support vector machines; Web search engine; biomedical domains; correlation coefficients; information retrieval; lexico-syntactic patterns; natural language processing; page counts; page-count-based semantic similarity; similarity scores; support vector machines; Biomedical measurements; Correlation; Kernel; Medical services; Semantics; Support vector machine classification; Training; Data Mining; Electronic Health Records; Health Records, Personal; Internet; Natural Language Processing; Pattern Recognition, Automated; Semantics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE
  • Conference_Location
    Buenos Aires
  • ISSN
    1557-170X
  • Print_ISBN
    978-1-4244-4123-5
  • Type

    conf

  • DOI
    10.1109/IEMBS.2010.5626008
  • Filename
    5626008