• DocumentCode
    668299
  • Title

    Improving Gloss Vector Semantic Relatedness Measure by Integrating Pointwise Mutual Information: Optimizing Second-Order Co-occurrence Vectors Computed from Biomedical Corpus and UMLS

  • Author

    Pesaranghader, Ahmad ; Muthaiyah, Saravanan ; Pesaranghader, Ahmad

  • Author_Institution
    Fac. of Creative Multimedia, MMU, Cyberjaya, Malaysia
  • fYear
    2013
  • fDate
    4-6 Sept. 2013
  • Firstpage
    196
  • Lastpage
    201
  • Abstract
    Methods of semantic relatedness are essential for wide range of tasks such as information retrieval and text mining. This paper, concerned with these automated methods, attempts to improve Gloss Vector semantic relatedness measure for more reliable estimation of relatedness between two input concepts. Generally, this measure by considering frequency cut-off for big rams tries to remove low and high frequency words which usually do not end up being significant features. However, this naive cutting approach can lead to loss of valuable information. By employing point wise mutual information (PMI) as a measure of association between features, we will try to enforce the foregoing elimination step in a statistical fashion. Applying both approaches to the biomedical domain, using MEDLINE as corpus, MeSH as thesaurus, and available reference standard of 311 concept pairs manually rated for semantic relatedness, we will show that PMI for removing insignificant features is more effective approach than frequency cut-off.
  • Keywords
    data mining; information retrieval; medical computing; text analysis; MEDLINE; MeSH; PMI; UMLS; biomedical corpus; gloss vector semantic relatedness measure; information retrieval; naive cutting approach; point wise mutual information; pointwise mutual information; second-order co-occurrence vectors; text mining; unified medical language system; Biomedical measurement; Cutoff frequency; Frequency measurement; Semantics; Taxonomy; Unified modeling language; Vectors; Bioinformatics; Biomedical Text Mining; Computational Linguistics; Semantic Relatedness; UMLS;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Informatics and Creative Multimedia (ICICM), 2013 International Conference on
  • Conference_Location
    Kuala Lumpur
  • Type

    conf

  • DOI
    10.1109/ICICM.2013.41
  • Filename
    6702809