Title :
Sequence memoizer based model for Biomedical Named Entity Recognition
Author :
Sun, Yaming ; Sun, Chengjie ; Lin, Lei ; Liu, Ming ; Wang, Xiaolong
Author_Institution :
Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., Harbin, China
Abstract :
Biomedical Named Entity Recognition (Bio-NER) plays a fundamental role in biomedical text mining. The improvement of the Bio-NER module´s performance always brings notable enhancement to the entire text mining system. Treating the Bio-NER task as a sequence labeling problem, this paper proposes a generative approach to address it. By adopting the spirit of Sequence Memoizer, our approach gives reasonable named entity labels to the elements in the given sequences, based on the better modeling of the context word distributions. Compared with the traditional generative models like Hidden Markov Model, our approach works better on capturing the power-law scaling and the long range dependencies of natural language. Our method is evaluated on the JNLPBA 2004 dataset and compared with the classic generative and discriminative approaches. The experimental results have shown that it outperforms the HMM model and is comparable with the Maxent model.
Keywords :
data mining; medical computing; natural language processing; pattern recognition; text analysis; JNLPBA 2004 dataset; biomedical named entity recognition; biomedical text mining; context word distribution modeling; long range dependency; natural language; power-law scaling; sequence labeling problem; sequence memoizer based model; Biological system modeling; Computational modeling; Context; Data models; Hidden Markov models; Training; Training data; Sequence Memoizer; biomedical named entity recognition; generative model;
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2012 9th International Conference on
Conference_Location :
Sichuan
Print_ISBN :
978-1-4673-0025-4
DOI :
10.1109/FSKD.2012.6234154