Title :
Mention detection and classification in bio-chemical domain using Conditional Random Field
Author :
Ekbal, Asif ; Saha, Simanto ; Ravi, Koustuban
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Patna, Patna, India
fDate :
Nov. 30 2012-Dec. 1 2012
Abstract :
Finding mentions of chemical names in texts is of huge interest due to its importance in wide-spread application areas. The inherent complex structures of chemical names and the existence of several representations and nomenclatures (like SMILES, InChI, IUPAC) pose a big challenge to their automatic identification and classification. In this paper we present a supervised machine learning approach based on Conditional Random Fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text. We identify and implement a very rich feature set for the task without using any domain specific knowledge and/or resources. Experiments are carried out on the benchmark MEDLINE datasets. Evaluation shows encouraging performance with the overall recall, precision and F-measure values of 90.96%, 91.52% and 91.23%, respectively. We also present the scope of comparison to the existing state-of-the-art system(s).
Keywords :
learning (artificial intelligence); medical computing; text analysis; F-measure value; MEDLINE dataset; bio-chemical domain; chemical name; conditional random field; mention classification; mention detection; precision value; recall value; scientific text; supervised machine learning approach; Chemicals; Context; Data mining; Feature extraction; Patents; Training; Training data;
Conference_Titel :
Emerging Applications of Information Technology (EAIT), 2012 Third International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4673-1828-0
DOI :
10.1109/EAIT.2012.6407943