• DocumentCode
    24620
  • Title

    Curatable Named-Entity Recognition Using Semantic Relations

  • Author

    Yi-Yu Hsu ; Hung-Yu Kao

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
  • Volume
    12
  • Issue
    4
  • fYear
    2015
  • fDate
    July-Aug. 1 2015
  • Firstpage
    785
  • Lastpage
    792
  • Abstract
    Named-entity recognition (NER) plays an important role in the development of biomedical databases. However, the existing NER tools produce multifarious named-entities which may result in both curatable and non-curatable markers. To facilitate biocuration with a straightforward approach, classifying curatable named-entities is helpful with regard to accelerating the biocuration workflow. Co-occurrence Interaction Nexus with Named-entity Recognition (CoINNER) is a web-based tool that allows users to identify genes, chemicals, diseases, and action term mentions in the Comparative Toxicogenomic Database (CTD). To further discover interactions, CoINNER uses multiple advanced algorithms to recognize the mentions in the BioCreative IV CTD Track. CoINNER is developed based on a prototype system that annotated gene, chemical, and disease mentions in PubMed abstracts at BioCreative 2012 Track I (literature triage). We extended our previous system in developing CoINNER. The pre-tagging results of CoINNER were developed based on the state-of-the-art named entity recognition tools in BioCreative III. Next, a method based on conditional random fields (CRFs) is proposed to predict chemical and disease mentions in the articles. Finally, action term mentions were collected by latent Dirichlet allocation (LDA). At the BioCreative IV CTD Track, the best F-measures reached for gene/protein, chemical/drug and disease NER were 54 percent while CoINNER achieved a 61.5 percent F-measure. System URL: http://ikmbio.csie.ncku.edu.tw/coinner/ introduction.htm.
  • Keywords
    bioinformatics; diseases; drugs; genetics; genomics; pattern recognition; BioCreative 2012 Track I; BioCreative IV CTD Track; F-measures; NER tools; PubMed abstracts; action term mentions; annotated gene; biocuration workflow; biomedical databases; chemical-drug NER; comparative toxicogenomic database; conditional random fields; curatable named-entities; curatable named-entity recognition; disease NER; gene-protein NER; latent Dirichlet allocation; multifarious named-entities; multiple advanced algorithms; prototype system; semantic relations; state-of-the-art named entity recognition tools; Chemicals; Diseases; Feature extraction; IEEE transactions; Semantics; Support vector machines; Training; Biomedical text mining; curated term identification; named-entity recognition;
  • fLanguage
    English
  • Journal_Title
    Computational Biology and Bioinformatics, IEEE/ACM Transactions on
  • Publisher
    ieee
  • ISSN
    1545-5963
  • Type

    jour

  • DOI
    10.1109/TCBB.2014.2366770
  • Filename
    6945344