DocumentCode
668299
Title
Improving Gloss Vector Semantic Relatedness Measure by Integrating Pointwise Mutual Information: Optimizing Second-Order Co-occurrence Vectors Computed from Biomedical Corpus and UMLS
Author
Pesaranghader, Ahmad ; Muthaiyah, Saravanan ; Pesaranghader, Ahmad
Author_Institution
Fac. of Creative Multimedia, MMU, Cyberjaya, Malaysia
fYear
2013
fDate
4-6 Sept. 2013
Firstpage
196
Lastpage
201
Abstract
Methods of semantic relatedness are essential for wide range of tasks such as information retrieval and text mining. This paper, concerned with these automated methods, attempts to improve Gloss Vector semantic relatedness measure for more reliable estimation of relatedness between two input concepts. Generally, this measure by considering frequency cut-off for big rams tries to remove low and high frequency words which usually do not end up being significant features. However, this naive cutting approach can lead to loss of valuable information. By employing point wise mutual information (PMI) as a measure of association between features, we will try to enforce the foregoing elimination step in a statistical fashion. Applying both approaches to the biomedical domain, using MEDLINE as corpus, MeSH as thesaurus, and available reference standard of 311 concept pairs manually rated for semantic relatedness, we will show that PMI for removing insignificant features is more effective approach than frequency cut-off.
Keywords
data mining; information retrieval; medical computing; text analysis; MEDLINE; MeSH; PMI; UMLS; biomedical corpus; gloss vector semantic relatedness measure; information retrieval; naive cutting approach; point wise mutual information; pointwise mutual information; second-order co-occurrence vectors; text mining; unified medical language system; Biomedical measurement; Cutoff frequency; Frequency measurement; Semantics; Taxonomy; Unified modeling language; Vectors; Bioinformatics; Biomedical Text Mining; Computational Linguistics; Semantic Relatedness; UMLS;
fLanguage
English
Publisher
ieee
Conference_Titel
Informatics and Creative Multimedia (ICICM), 2013 International Conference on
Conference_Location
Kuala Lumpur
Type
conf
DOI
10.1109/ICICM.2013.41
Filename
6702809
Link To Document