DocumentCode :
3227876
Title :
Two-phase biomedical named entity recognition based on semi-CRFs
Author :
Yang, Li ; Zhou, Yanhong
Author_Institution :
Sch. of Comput. Sci. & Technol., Huazhong Univ. of Sci. & Technol., Wuhan, China
fYear :
2010
fDate :
23-26 Sept. 2010
Firstpage :
1061
Lastpage :
1065
Abstract :
As a crucial step for the other tasks, such as human gene/protein normalization, relationship extraction and hypothesis generation, biomedical named entity recognition remains a challenging task. This paper represents a two-phase approach based on semi-CRFs and novel feature sets. Semi-CRFs put the label to a segment not a single word which is more natural than the other machine learning methods. Our approach divides the whole biomedical NER into two sub-tasks: term boundary detection and semantic labeling. At the first phase, term boundary detection sub-task detects the boundary of the entities and classifies the entities into one type C. At the second phase, semantic labeling sub-task label the entities detected at the first phase the correct entity type. To make a comparison, experiments conducted both on CRFs model and semi-CRFs model at each phase. Our experiments carried out on JNLPBA2004 datasets achieve an F-score of 73.20% based on semi-CRFs without deep domain knowledge and post-processing algorithm, which outperforms most of the state-of-the-art systems.
Keywords :
information retrieval; learning (artificial intelligence); medical information systems; biomedical NER; conditional random fields; human gene/protein normalization; hypothesis generation; machine learning methods; postprocessing algorithm; relationship extraction entity; semantic labeling subtask; semiCRF; two phase biomedical named entity recognition; Biological system modeling; Computational modeling; DNA; Hidden Markov models; Protein engineering; Proteins; RNA; feature sets; named entity recognition; semi-CRFs; two phases;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bio-Inspired Computing: Theories and Applications (BIC-TA), 2010 IEEE Fifth International Conference on
Conference_Location :
Changsha
Print_ISBN :
978-1-4244-6437-1
Type :
conf
DOI :
10.1109/BICTA.2010.5645108
Filename :
5645108
Link To Document :
بازگشت