Title :
Normalization of Gene/Protein Names in Biological Literatures using Vector-Space Model
Author :
Joon-Ho Lim ; Hyunchul Jang ; Jaesoo Lim ; Soo-Jun Park
Author_Institution :
Electron. & Telecommun. Res. Inst., Daejeon
Abstract :
As the number of biological literatures grows exponentially, needs for text mining system are increased. In text mining area, normalization is mapping gene/protein names to a database. It is necessary to combine extracted information from various literatures and to curate a database or an ontology using literatures. Previous normalization researches used direct comparison methods between a database and literatures, but it is weak to extremely variational gene/protein names in literatures. Therefore, in this paper, we propose a normalization method using vector-space model. For each gene/protein name, we rank identifiers using vector-space model, and find the most similar identifier with the name. Experimental result shows the proposed method has 70.7% f-measure.
Keywords :
biochemistry; biology computing; data mining; genetics; molecular biophysics; ontologies (artificial intelligence); pattern recognition; proteins; text analysis; vectors; biological literatures; gene names; normalization method; ontology; protein names; text mining system; vector-space model; Biological system modeling; Data mining; Dictionaries; Humans; Learning systems; Noise measurement; Ontologies; Proteins; Relational databases; Text mining; Abstracting and Indexing as Topic; Databases, Genetic; Genes; Models, Theoretical; Proteins; Terminology as Topic;
Conference_Titel :
Engineering in Medicine and Biology Society, 2007. EMBS 2007. 29th Annual International Conference of the IEEE
Conference_Location :
Lyon
Print_ISBN :
978-1-4244-0787-3
DOI :
10.1109/IEMBS.2007.4352306