Title :
Curatable Named-Entity Recognition Using Semantic Relations
Author :
Yi-Yu Hsu ; Hung-Yu Kao
Author_Institution :
Dept. of Comput. Sci. & Inf. Eng., Nat. Cheng Kung Univ., Tainan, Taiwan
Abstract :
Named-entity recognition (NER) plays an important role in the development of biomedical databases. However, the existing NER tools produce multifarious named-entities which may result in both curatable and non-curatable markers. To facilitate biocuration with a straightforward approach, classifying curatable named-entities is helpful with regard to accelerating the biocuration workflow. Co-occurrence Interaction Nexus with Named-entity Recognition (CoINNER) is a web-based tool that allows users to identify genes, chemicals, diseases, and action term mentions in the Comparative Toxicogenomic Database (CTD). To further discover interactions, CoINNER uses multiple advanced algorithms to recognize the mentions in the BioCreative IV CTD Track. CoINNER is developed based on a prototype system that annotated gene, chemical, and disease mentions in PubMed abstracts at BioCreative 2012 Track I (literature triage). We extended our previous system in developing CoINNER. The pre-tagging results of CoINNER were developed based on the state-of-the-art named entity recognition tools in BioCreative III. Next, a method based on conditional random fields (CRFs) is proposed to predict chemical and disease mentions in the articles. Finally, action term mentions were collected by latent Dirichlet allocation (LDA). At the BioCreative IV CTD Track, the best F-measures reached for gene/protein, chemical/drug and disease NER were 54 percent while CoINNER achieved a 61.5 percent F-measure. System URL: http://ikmbio.csie.ncku.edu.tw/coinner/ introduction.htm.
Keywords :
bioinformatics; diseases; drugs; genetics; genomics; pattern recognition; BioCreative 2012 Track I; BioCreative IV CTD Track; F-measures; NER tools; PubMed abstracts; action term mentions; annotated gene; biocuration workflow; biomedical databases; chemical-drug NER; comparative toxicogenomic database; conditional random fields; curatable named-entities; curatable named-entity recognition; disease NER; gene-protein NER; latent Dirichlet allocation; multifarious named-entities; multiple advanced algorithms; prototype system; semantic relations; state-of-the-art named entity recognition tools; Chemicals; Diseases; Feature extraction; IEEE transactions; Semantics; Support vector machines; Training; Biomedical text mining; curated term identification; named-entity recognition;
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
DOI :
10.1109/TCBB.2014.2366770