مرکز منطقه ای اطلاع رساني علوم و فناوري - Identifying Abbreviation Definitions Machine Learning with Naturally Labeled Data

DocumentCode :

2454457

Title :

Identifying Abbreviation Definitions Machine Learning with Naturally Labeled Data

Author :

Yeganova, Lana ; Comeau, Donald C. ; Wilbur, W. John

Author_Institution :

Nat. Center for Biotechnol. Inf., NIH, Bethesda, MD, USA

fYear :

2010

fDate :

12-14 Dec. 2010

Firstpage :

499

Lastpage :

505

Abstract :

The rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. In this work, we develop a machine learning algorithm for abbreviation definition identification in text. Most existing approaches for abbreviation definition identification employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data. In this study, we make use of what we term naturally labeled data. Positive training examples are extracted from text, which provides naturally occurring potential abbreviation-definition pairs. Negative training examples are generated randomly by mixing potential abbreviations with unrelated potential definitions. The machine learner is trained to distinguish between these two sets of examples. Then, the learned feature weights are used to identify the abbreviation full form. This approach does not require manually labeled training data. We evaluate the performance of our algorithm on the Ab3P, BIOADI and Meds tract corpora. We achieve an F-score that is comparable to the earlier existing systems yet with a higher recall.

Keywords :

learning (artificial intelligence); medical information systems; text analysis; F-score; abbreviation definition identification; abbreviation detection; biomedical literature; labeled training data; machine learning; naturally labeled data; negative training example; positive training example; text analysis; text processing tool; Compounds; Hardware design languages; Humans; Lipidomics; Machine learning algorithms; Training; Training data;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Machine Learning and Applications (ICMLA), 2010 Ninth International Conference on

Conference_Location :

Washington, DC

Print_ISBN :

978-1-4244-9211-4

Type :

conf

DOI :

10.1109/ICMLA.2010.166

Filename :

5708877

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2454457