DocumentCode :
3084916
Title :
Extracting Biomarker Information Applying Natural Language Processing and Machine Learning
Author :
Islam, Md Tawhidul ; Shaikh, Mostafa ; Nayak, Abhaya ; Ranganathan, Shoba
Author_Institution :
Dept. of Chem. & Biomol. Sci., Macquarie Univ., North Ryde, NSW, Australia
fYear :
2010
fDate :
18-20 June 2010
Firstpage :
1
Lastpage :
4
Abstract :
In this paper, we detail an approach to a very specific task of information extraction namely, extracting biomarker information in biomedical literature. Starting with the abstract of a given publication, we first identify the evaluative sentence(s) among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities (i.e., semantic category recognition). For the entities like, protein, gene and disease, we determine whether the statement refers to biomarker relationship (i.e., assertion classification). Finally, we identify the biomarker relationship among the bio-medical entities (i.e., semantic relationship classification). The system, Biomarker Information Extraction Tool (BIET) implements Machine Learning-based biomarker extraction using support vector machines (SVM). The system is trained and tested on a corpus of oncology related PubMed/MEDLINE literatures hand-annotated with biomarker information. We investigate the effectiveness of different features for this task and examine the amount of training data needed to learn the biomarker relationship with the entities. Our system achieved an average F-score of 86% for the task of biomarker information extraction comparing to the human annotated dataset (i.e. gold standard) scores.
Keywords :
bioinformatics; data mining; genetic algorithms; information retrieval; learning (artificial intelligence); natural language processing; support vector machines; text analysis; F-score; assertion classification; bio-medical entities; biomarker information extraction tool; biomedical literature; disease; gene; machine learning; machine learning-based biomarker extraction; natural language processing; oncology corpus; protein; semantic category recognition; support vector machines; word recognition; Biomarkers; Data mining; Diseases; Machine learning; Natural language processing; Proteins; Support vector machine classification; Support vector machines; System testing; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Bioinformatics and Biomedical Engineering (iCBBE), 2010 4th International Conference on
Conference_Location :
Chengdu
ISSN :
2151-7614
Print_ISBN :
978-1-4244-4712-1
Electronic_ISBN :
2151-7614
Type :
conf
DOI :
10.1109/ICBBE.2010.5514717
Filename :
5514717
Link To Document :
بازگشت