DocumentCode
24433
Title
SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedical Text
Author
Chih-Hsuan Wei ; Leaman, Robert ; Zhiyong Lu
Author_Institution
Nat. Inst. of Health & Nat. Center for Biotechnol. Inf. (NCBI), Bethesda, MD, USA
Volume
19
Issue
4
fYear
2015
fDate
Jul-15
Firstpage
1385
Lastpage
1391
Abstract
One particular challenge in biomedical named entity recognition (NER) and normalization is the identification and resolution of composite named entities, where a single span refers to more than one concept (e.g., BRCA1/2). Previous NER and normalization studies have either ignored composite mentions, used simple ad hoc rules, or only handled coordination ellipsis, making a robust approach for handling multitype composite mentions greatly needed. To this end, we propose a hybrid method integrating a machine-learning model with a pattern identification strategy to identify the individual components of each composite mention. Our method, which we have named SimConcept, is the first to systematically handle many types of composite mentions. The technique achieves high performance in identifying and resolving composite mentions for three key biological entities: genes (90.42% in F-measure), diseases (86.47% in F-measure), and chemicals (86.05% in F-measure). Furthermore, our results show that using our SimConcept method can subsequently improve the performance of gene and disease concept recognition and normalization. SimConcept is available for download at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/SimConcept/.
Keywords
data mining; diseases; learning (artificial intelligence); medical computing; pattern recognition; text analysis; SimConcept method; biomedical named entity recognition; coordination ellipsis; disease concept recognition; gene concept recognition; machine-learning model; pattern identification; Abstracts; Chemicals; Diseases; Informatics; Protein engineering; Proteins; Semantics; bioNLP; composite mentions; coordination ellipsis; named entity recognition; text mining;
fLanguage
English
Journal_Title
Biomedical and Health Informatics, IEEE Journal of
Publisher
ieee
ISSN
2168-2194
Type
jour
DOI
10.1109/JBHI.2015.2422651
Filename
7084590
Link To Document