DocumentCode :
406335
Title :
A MeSH term based distance measure for document retrieval and labeling assistance
Author :
Ontrup, Jörg ; Nattkemper, Tim W. ; Gerstung, Olaf ; Ritter, Helge
Author_Institution :
Neuroinformatics Group, Bielefeld Univ., Germany
Volume :
2
fYear :
2003
fDate :
17-21 Sept. 2003
Firstpage :
1303
Abstract :
For biomedical and pharmaceutical research, the PUBMED database of the NLM (National Library of Medicine) has become a viable platform. It provides the means for profound investigations of past and related research in daily scientific work. One basic aspect is the search for articles related to a certain research topic. In order to express relatedness many text-mining or document retrieval approaches make use of the "bag of words" model in which unstructured text is represented as a vector of word counts. Since full length articles are not commonly available, many systems generate feature vectors from abstract data only - therefore limiting the explanatory power of their feature space. Since MeSH (Medical Subject Headings) assigned by human experts cover full length articles, we propose for the first time a nonEuclidean document distance measure based on MeSH tree structures. We quantitatively evaluate the approach in comparison to a standard vector space approach and a hybrid version of both. The MeSH-based showed promising results, yet it is still surpassed by the vector space model.
Keywords :
data mining; information retrieval; medical information systems; tree data structures; MeSH term; MeSH tree structures; Medical Subject Headings; National Library of Medicine; PUBMED database; data-driven methods; document retrieval; labeling assistance; nonEuclidean document distance measure; pharmaceutical research; standard vector space model; text-mining; Biomedical measurements; Databases; Humans; Labeling; Length measurement; Libraries; Pharmaceuticals; Power generation; Power system modeling; Time measurement;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Engineering in Medicine and Biology Society, 2003. Proceedings of the 25th Annual International Conference of the IEEE
ISSN :
1094-687X
Print_ISBN :
0-7803-7789-3
Type :
conf
DOI :
10.1109/IEMBS.2003.1279511
Filename :
1279511
Link To Document :
بازگشت