Title :
Two stage genetic approach for bio-chemical named entity recognition
Author :
Ekbal, Asif ; Saha, Simanto
Author_Institution :
Dept. of Comput. Sci. & Eng., Indian Inst. of Technol. Patna, Patna, India
Abstract :
Determining different mentions of chemical names from texts has a wide-spread application in real life. Chemical names are complex in nature and there exist several representations and nomenclatures (like SMILES, InChI, IUPAC) which create a big challenge to their automatic identification and classification. In this paper we present a feature selection approach for appropriate feature subset selection from a well-known supervised machine learning approach namely conditional random field based classifier (CRF). Several features are identified and extracted without using any domain specific knowledge and/or resources for determining mentions of IUPAC and IUPAC-like names from scientific text using some supervised classification technique. The appropriate set of features for a particular supervised classification technique is extracted from this huge collection of features using some single objective genetic algorithm based feature selection technique. Experiments are carried out on the benchmark patent dataset. Evaluation shows encouraging performance with the overall F-measure values of 70.01% by single objective optimization based approach on patent 2008 test data set.
Keywords :
chemistry computing; feature extraction; genetic algorithms; learning (artificial intelligence); pattern classification; CRF; F-measure; IUPAC names; IUPAC-like names; bio-chemical named entity recognition; chemical names classification; chemical names identification; conditional random field based classifier; feature identification; feature selection approach; feature subset selection; features extraction; single objective genetic algorithm; single objective optimization; supervised classification technique; supervised machine learning approach; two stage genetic approach; Biological cells; Chemicals; Feature extraction; Genetic algorithms; Patents; Sociology; Statistics; Conditional Random Field; Feature Selection; Genetic Algorithm; Mention Detection and Classification from Biochemical Domain;
Conference_Titel :
Advances in Computing, Communications and Informatics (ICACCI), 2013 International Conference on
Conference_Location :
Mysore
Print_ISBN :
978-1-4799-2432-5
DOI :
10.1109/ICACCI.2013.6637260