Title :
Using MEDLINE as a knowledge source for disambiguating abbreviations in full-text biomedical journal articles
Author :
Hong Yu ; Won Kim
Author_Institution :
Dept. of Comput. Sci., Columbia Univ., New York, NY, USA
Abstract :
Biomedical abbreviations and acronyms are widely used in biomedical literature. Since many abbreviations represent important content in biomedical literature, information retrieval and extraction benefits from identifying the meanings of biomedical abbreviations. Since many abbreviations are ambiguous, it would be important to map abbreviations to their full forms, which ultimately represent the meanings of the abbreviations. In this study, we present a novel unsupervised method that applies MEDLINE records as a knowledge source for disambiguating abbreviations in full-text biomedical journal articles. We first automatically generated from MEDLINE records a knowledge source or dictionary of abbreviation-full pairs. We then trained on MEDLINE records and predicted the full forms of abbreviations in full-text journal articles by applying supervised machine-learning algorithms in an unsupervised fashion. We report up to 92% prediction precision and up to 91% coverage.
Keywords :
dictionaries; information retrieval; learning (artificial intelligence); medical computing; nomenclature; text analysis; MEDLINE; abbreviation disambiguation; dictionary; full-text biomedical journal articles; information extraction; information retrieval; knowledge source; prediction precision; supervised machine-learning algorithms; Abstracts; Biomedical computing; Biotechnology; Computer science; Data mining; Dictionaries; Information retrieval; Learning systems; Pattern matching; Proteins;
Conference_Titel :
Computer-Based Medical Systems, 2004. CBMS 2004. Proceedings. 17th IEEE Symposium on
Print_ISBN :
0-7695-2104-5
DOI :
10.1109/CBMS.2004.1311686