Title :
Feature selection for event extraction in biomedical text
Author :
Majumder, Amit ; Hasanuzzaman, Mohammed ; Ekbal, Asif
Author_Institution :
Dept. of Comput. Sci. & Eng., Acad. of Technol., India
Abstract :
In this paper we report our work on multiobjective optimization (MOO) based feature selection approach for event extraction in biomedical texts. Event extraction deals with the detection and classification of expressions that represent complex biological phenomenon involving genes and proteins. We perform feature selection within the framework of a robust machine learning algorithm, namely Conditional Random Field (CRF). We implement a set of diverse features that exploit lexical, shallow syntactic and contextual information. At first we develop a single objective optimization (SOO) based feature selection technique where we optimize F-measure function. Thereafter we develop two different models of MOO based feature selection by optimizing different pairs of objective functions, i.e. recall and precision; and feature count and F-measure. We carried out experiments on the benchmark setup of BioNLP-2013 shared task. We obtain the best performance with the overall average recall, precision and F-measure values of 57.04%, 75.08% and 64.77%, respectively. Evaluation shows that the classifier can achieve good performance level when trained with an effective feature set. We also observe that MOO can indeed performs better than the SOO based approach.
Keywords :
data mining; feature selection; genetics; learning (artificial intelligence); medical computing; optimisation; pattern classification; proteins; random processes; text analysis; BioNLP-2013 shared task; F-measure function optimization; biomedical texts; complex biological phenomenon representation; conditional random field; contextual information; event extraction; expression classification; expression detection; feature count; genes; lexical information; multiobjective optimization based feature selection approach; precision objective function; proteins; recall objective function; robust machine learning algorithm; shallow syntactic information; single objective optimization based feature selection technique; text mining; Biological cells; Context; Feature extraction; Genetic algorithms; Linear programming; Optimization; Proteins; Conditional Random Field; Event Extraction; Feature Selection; Text Mining;
Conference_Titel :
Advances in Pattern Recognition (ICAPR), 2015 Eighth International Conference on
Conference_Location :
Kolkata
DOI :
10.1109/ICAPR.2015.7050708