DocumentCode :
65061
Title :
Gelsius: A Literature-Based Workflow for Determining Quantitative Associations between Genes and Biological Processes
Author :
Abate, F. ; Acquaviva, Andrea ; Ficarra, Elisa ; Piva, Roberto ; Macii, E.
Author_Institution :
Polytech. of Turin, Turin, Italy
Volume :
10
Issue :
3
fYear :
2013
fDate :
May-June 2013
Firstpage :
619
Lastpage :
631
Abstract :
An effective knowledge extraction and quantification methodology from biomedical literature would allow the researcher to organize and analyze the results of high-throughput experiments on microarrays and next-generation sequencing technologies. Despite the large amount of raw information available on the web, a tool able to extract a measure of the correlation between a list of genes and biological processes is not yet available. In this paper, we present Gelsius, a workflow that incorporates biomedical literature to quantify the correlation between genes and terms describing biological processes. To achieve this target, we build different modules focusing on query expansion and document cononicalization. In this way, we reached to improve the measurement of correlation, performed using a latent semantic analysis approach. To the best of our knowledge, this is the first complete tool able to extract a measure of genes-biological processes correlation from literature. We demonstrate the effectiveness of the proposed workflow on six biological processes and a set of genes, by showing that correlation results for known relationships are in accordance with definitions of gene functions provided by NCI Thesaurus. On the other side, the tool is able to propose new candidate relationships for later experimental validation. The tool is available at http://bioeda1.polito.it:8080/medSearchServlet/.
Keywords :
document handling; genetics; knowledge acquisition; medical computing; query processing; Gelsius; NCI thesaurus; Web; biological process; biomedical literature; document cononicalization; gene functions; knowledge extraction; latent semantic analysis approach; literature-based workflow; microarrays; next-generation sequencing technology; query expansion; Abstracts; Biological processes; Biomedical measurements; Correlation; Large scale integration; Semantics; Unified modeling language; Abstracts; Biological processes; Biomedical measurements; Correlation; Gelsius; Large scale integration; NCI thesaurus; Semantics; UMLS; Unified modeling language; Web; biological process; biomedical literature; document cononicalization; document handling; gene functions; gene ontology; genetics; knowledge acquisition; knowledge extraction; latent semantic analysis approach; literature-based workflow; medical computing; microarrays; next-generation sequencing technology; ontologies; query expansion; query processing; text mining; thesaurus;
fLanguage :
English
Journal_Title :
Computational Biology and Bioinformatics, IEEE/ACM Transactions on
Publisher :
ieee
ISSN :
1545-5963
Type :
jour
DOI :
10.1109/TCBB.2013.11
Filename :
6468036
Link To Document :
بازگشت