DocumentCode :
2057998
Title :
Generating Semantics for the Life Sciences via Text Analytics
Author :
Buyko, Ekaterina ; Hahn, Udo
Author_Institution :
Jena Univ. Language & Inf. Eng. (JULIE) Lab., Friedrich-Schiller-Univ. Jena, Jena, Germany
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
193
Lastpage :
196
Abstract :
The life sciences have a strong need for carefully curated, semantically rich fact repositories. Knowledge harvesting from unstructured textual sources is currently performed by highly skilled curators who manually feed semantics into such databases as a result of deep understanding of the documents chosen to populate such repositories. As this is a slow and costly process, we here advocate an automatic approach to the generation of database contents which is based on JREX, a high performance relation extraction system. As a real-life example, we target REGULONDB, the world´s largest manually curated reference database for the transcriptional regulation network of E. coli. We investigate in our study the performance of automatic knowledge capture from various literature sources, such as PUBMED abstracts and associated full text articles. Our results show that we can, indeed, automatically re-create a considerable portion of the REGULONDB database by processing the relevant literature sources. Hence, this approach might help curators widen the knowledge acquisition bottleneck in this field.
Keywords :
biology computing; database management systems; knowledge acquisition; text analysis; JREX; REGULONDB database; automatic knowledge capture; database content generation; knowledge acquisition bottleneck; knowledge harvesting; life sciences; manual curated reference database; relation extraction system; semantic generation; text analytics; unstructured textual sources; Abstracts; Databases; Gene expression; Radio frequency; Semantics; Syntactics; Biomedical Text Mining; Event Extraction; Information Extraction;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on
Conference_Location :
Palo Alto, CA
Print_ISBN :
978-1-4577-1648-5
Electronic_ISBN :
978-0-7695-4492-2
Type :
conf
DOI :
10.1109/ICSC.2011.75
Filename :
6061353
Link To Document :
بازگشت