Title :
Boosting performance of gene mention tagging system by classifiers ensemble
Author :
Li, Lishuang ; Sun, Jing ; Huang, Degen
Author_Institution :
Sch. of Comput. Sci. & Eng., Dalian Univ. of Technol., Dalian, China
Abstract :
To further improve the tagging performance of single classifiers, a classifiers ensemble experimental framework is presented for gene mention tagging. In the framework, six classifiers are constructed by four toolkits (CRF++, YamCha, Maximum Entropy (ME) and MALLET) with different training methods and feature sets and then combined with a two-layer stacking algorithm. The recognition results of different classifiers are regarded as input feature vectors to be incorporated, and then a high-powered model is obtained. Experiments carried out on the corpus of BioCreative II GM task show that the classifiers ensemble method is effective and our best combination method achieves an F-score of 88.09%, which outperforms most of the top-ranked Bio-NER systems in the BioCreAtIvE II GM challenge.
Keywords :
bioinformatics; data mining; maximum entropy methods; pattern classification; text analysis; Bio-NER systems; BioCreative II GM task; CRF++; F-score; MALLET; YamCha; classifiers ensemble; gene mention tagging system; input feature vectors; maximum entropy; two layer stacking algorithm; Biology; Educational institutions; Software; Classifiers Ensemble; Gene Mention Tagging; Named Entity Recognition; Text Mining;
Conference_Titel :
Natural Language Processing and Knowledge Engineering (NLP-KE), 2010 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-6896-6
DOI :
10.1109/NLPKE.2010.5587822