DocumentCode :
3259567
Title :
Mining Information Extraction Models for HmtDB annotation
Author :
Berardi, Margherita ; Malerba, Donato ; Attimonelli, Marcella
Author_Institution :
Dipt. di Informatica, Univ. degli Studi di Bari
fYear :
2006
fDate :
Dec. 2006
Firstpage :
207
Lastpage :
212
Abstract :
Advances of genome sequencing techniques have risen an overwhelming increase in the literature on discovered genes, proteins and their role in biological processes. However, the biomedical literature remains a greatly unexploited source of biological information. Information extraction (IE) techniques are necessary to map this information into structured representations that allow facts relating domain-relevant entities to be automatically recognized. In this paper, we present a framework that supports biologists in the task of automatic extraction of information from texts. The framework integrates a data mining module that discovers extraction rules from a set of manually labelled texts. Extraction models are subsequently applied in an automatic mode on unseen texts. We report an application to a real-world dataset composed by publications selected to support biologists in the annotation of the HmtDB database
Keywords :
biology computing; data mining; genetics; information retrieval; proteins; text analysis; HmtDB annotation; HmtDB database; biological information; biological process; data mining; discovered genes; domain-relevant entities; extraction rules; genome sequencing; information extraction models; manually labelled texts; proteins; structured representations; Bioinformatics; Biological information theory; Biological system modeling; DNA; Data mining; Databases; Genetic mutations; Genomics; Pathology; Proteins;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on
Conference_Location :
Hong Kong
Print_ISBN :
0-7695-2702-7
Type :
conf
DOI :
10.1109/ICDMW.2006.113
Filename :
4063626
Link To Document :
بازگشت