• DocumentCode
    24433
  • Title

    SimConcept: A Hybrid Approach for Simplifying Composite Named Entities in Biomedical Text

  • Author

    Chih-Hsuan Wei ; Leaman, Robert ; Zhiyong Lu

  • Author_Institution
    Nat. Inst. of Health & Nat. Center for Biotechnol. Inf. (NCBI), Bethesda, MD, USA
  • Volume
    19
  • Issue
    4
  • fYear
    2015
  • fDate
    Jul-15
  • Firstpage
    1385
  • Lastpage
    1391
  • Abstract
    One particular challenge in biomedical named entity recognition (NER) and normalization is the identification and resolution of composite named entities, where a single span refers to more than one concept (e.g., BRCA1/2). Previous NER and normalization studies have either ignored composite mentions, used simple ad hoc rules, or only handled coordination ellipsis, making a robust approach for handling multitype composite mentions greatly needed. To this end, we propose a hybrid method integrating a machine-learning model with a pattern identification strategy to identify the individual components of each composite mention. Our method, which we have named SimConcept, is the first to systematically handle many types of composite mentions. The technique achieves high performance in identifying and resolving composite mentions for three key biological entities: genes (90.42% in F-measure), diseases (86.47% in F-measure), and chemicals (86.05% in F-measure). Furthermore, our results show that using our SimConcept method can subsequently improve the performance of gene and disease concept recognition and normalization. SimConcept is available for download at: http://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/SimConcept/.
  • Keywords
    data mining; diseases; learning (artificial intelligence); medical computing; pattern recognition; text analysis; SimConcept method; biomedical named entity recognition; coordination ellipsis; disease concept recognition; gene concept recognition; machine-learning model; pattern identification; Abstracts; Chemicals; Diseases; Informatics; Protein engineering; Proteins; Semantics; bioNLP; composite mentions; coordination ellipsis; named entity recognition; text mining;
  • fLanguage
    English
  • Journal_Title
    Biomedical and Health Informatics, IEEE Journal of
  • Publisher
    ieee
  • ISSN
    2168-2194
  • Type

    jour

  • DOI
    10.1109/JBHI.2015.2422651
  • Filename
    7084590