Title :
Improving Gloss Vector Semantic Relatedness Measure by Integrating Pointwise Mutual Information: Optimizing Second-Order Co-occurrence Vectors Computed from Biomedical Corpus and UMLS
Author :
Pesaranghader, Ahmad ; Muthaiyah, Saravanan ; Pesaranghader, Ahmad
Author_Institution :
Fac. of Creative Multimedia, MMU, Cyberjaya, Malaysia
Abstract :
Methods of semantic relatedness are essential for wide range of tasks such as information retrieval and text mining. This paper, concerned with these automated methods, attempts to improve Gloss Vector semantic relatedness measure for more reliable estimation of relatedness between two input concepts. Generally, this measure by considering frequency cut-off for big rams tries to remove low and high frequency words which usually do not end up being significant features. However, this naive cutting approach can lead to loss of valuable information. By employing point wise mutual information (PMI) as a measure of association between features, we will try to enforce the foregoing elimination step in a statistical fashion. Applying both approaches to the biomedical domain, using MEDLINE as corpus, MeSH as thesaurus, and available reference standard of 311 concept pairs manually rated for semantic relatedness, we will show that PMI for removing insignificant features is more effective approach than frequency cut-off.
Keywords :
data mining; information retrieval; medical computing; text analysis; MEDLINE; MeSH; PMI; UMLS; biomedical corpus; gloss vector semantic relatedness measure; information retrieval; naive cutting approach; point wise mutual information; pointwise mutual information; second-order co-occurrence vectors; text mining; unified medical language system; Biomedical measurement; Cutoff frequency; Frequency measurement; Semantics; Taxonomy; Unified modeling language; Vectors; Bioinformatics; Biomedical Text Mining; Computational Linguistics; Semantic Relatedness; UMLS;
Conference_Titel :
Informatics and Creative Multimedia (ICICM), 2013 International Conference on
Conference_Location :
Kuala Lumpur
DOI :
10.1109/ICICM.2013.41