DocumentCode :
3502549
Title :
Bio-molecular event extraction using Support Vector Machine
Author :
Saha, Sriparna ; Hasanuzzaman, Md ; Majumder, Amit ; Ekbal, Asif
Author_Institution :
Dept. of Comput. Sci. & Eng., IIT Patna, Patna, India
fYear :
2011
fDate :
14-16 Dec. 2011
Firstpage :
298
Lastpage :
303
Abstract :
The main goal of Biomedical Natural Language Processing (BioNLP) is to capture biomedical phenomena from textual data by extracting relevant entities, information and relations between biomedical entities (i.e. proteins and genes). In general, in most of the published papers, only binary relations were extracted. In a recent past, the focus is shifted towards extracting more complex relations in the form of bio-molecular events that may include several entities or other relations. In this paper we propose an approach that enables event extraction (detection and classification) of relatively complex bio-molecular events. We approach this problem as a supervised classificat ion problem and use the well-known algorithm, namely Support Vector Machine (SVM) that makes use of statistical and linguistic features that represent various morphological, syntactic and contextual information of the candidate bio-molecular trigger words. Firstly, we consider the problem of event detection and classification as a two-step process, first step of which deals with the event detection task and the second step classifies these identified events to one of the nine predefined classes. Later on we treat this problem as one-step process, and perform event detection and classification together. Three-fold cross validation experiments on the BioNLP 2009 shared task datasets yield the overall average recall, precision and F-measure values of 62.95%, 74.53%, and 68.25%, respectively, for the event detection. We observed the overall classification accuracy of 72.50%. Evaluation results of the proposed approach when detection and classification are performed together showed the overall recall, precision and F-measure values of 57.66%, 55.87%, and 56.75%, respectively.
Keywords :
medical computing; natural language processing; pattern classification; support vector machines; text analysis; BioNLP 2009 shared task dataset; F-measure value; average recall value; binary relation extraction; bio-molecular event extraction; bio-molecular trigger words; biomedical entity; biomedical natural language processing; biomedical phenomena; contextual information; event classification; event detection; genes; linguistic feature; morphological information; precision value; proteins; statistical feature; supervised classification problem; support vector machine; syntactic information; textual data; three-fold cross validation; Context; Feature extraction; Kernel; Proteins; Support vector machine classification; Training; Biomedical natural language processing; Detection and classification; Event extraction; Support vector machine; Text mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Computing (ICoAC), 2011 Third International Conference on
Conference_Location :
Chennai
Print_ISBN :
978-1-4673-0670-6
Type :
conf
DOI :
10.1109/ICoAC.2011.6165192
Filename :
6165192
Link To Document :
بازگشت