DocumentCode :
2747021
Title :
Gene Name Automatic Recognition in Biomedical Literature
Author :
Yang, Zhihao ; Lin, Hongfei ; Zhao, Jing
Author_Institution :
Dept. of Comput. Sci. & Eng., Dalian Univ. of Technol.
Volume :
2
fYear :
0
fDate :
0-0 0
Firstpage :
9391
Lastpage :
9395
Abstract :
Identifying gene names in biomedical texts is regarded as a crucial step for text mining. Our approach is a combination of dictionary based approach and machine learning based approach. Based on a gene name dictionary, an edit distance approximate string searching algorithm was used to improve the recall rate of gene recognition which is greatly lowered due to a lack of standard gene-naming conventions. Then the naive Bayes and SVM classifiers were adopted to filter out false recognitions, therefore improving the precision rate of gene recognition. The experiments show that classifiers greatly improve precision with slight loss of recall, resulting in a much better F-score (from 53.7% to 67.6%)
Keywords :
biology computing; classification; data mining; dictionaries; genetics; learning (artificial intelligence); string matching; text analysis; SVM classifier; biomedical literature; biomedical texts; edit distance; gene name automatic recognition; gene name dictionary; machine learning; naive Bayes classifier; string searching; text mining; Biomedical engineering; Computer science; Dictionaries; Electronic mail; Epidermis; Machine learning; Support vector machine classification; Support vector machines; Text mining; Text recognition; Edit Distance; Naive Bayes Classifier; SVM Classifier; Text Mining;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on
Conference_Location :
Dalian
Print_ISBN :
1-4244-0332-4
Type :
conf
DOI :
10.1109/WCICA.2006.1713819
Filename :
1713819
Link To Document :
بازگشت