Title :
Identifying gene and protein names from biological texts
Author :
Xuan, Weijian ; Watson, Stanley J. ; Akil, Huda ; Meng, Fan
Author_Institution :
Dept. of Psychiatry, Michigan Univ., Ann Arbor, MI, USA
Abstract :
Extracting and identifying gene and protein names from literature is a critical step for mining functional information of genes and proteins. While extensive efforts have been devoted to this important task, most of them were aiming at extracting gene/protein name per se without paying much attention to associate the extracted name with existing gene and protein database entries. We developed a simple and efficient method to identify gene and protein names in literature using a combination of heuristic and statistical strategies. Our approach will map the extracted names to individual LocusLink entries thus enable the seamless integration of literature information with existing gene/protein databases. Evaluation on a test corpus shows that our method can achieve both high recall and precision. Our method exhibits good performance and can be used as a building block for large biomedical literature mining systems.
Keywords :
biology computing; data mining; genetics; identification; literature; proteins; LocusLink entries; biological texts; biomedical literature mining systems; gene database entries; gene functional information mining; gene name extraction; gene name identification; heuristic strategies; literature information seamless integration; protein database entries; protein name extraction; protein names identification; proteins functional information mining; statistical strategies; test corpus; Bioinformatics; Biology computing; Proteins;
Conference_Titel :
Bioinformatics Conference, 2003. CSB 2003. Proceedings of the 2003 IEEE
Print_ISBN :
0-7695-2000-6
DOI :
10.1109/CSB.2003.1227431