• DocumentCode
    680285
  • Title

    An integrative computational approach to identify disease-specific networks from PubMed literature information

  • Author

    Yuji Zhang ; Dingchen Li ; Cui Tao ; Feichen Shen ; Hongfang Liu

  • Author_Institution
    Dept. of Health Sci. Res., Mayo Clinic, Rochester, MN, USA
  • fYear
    2013
  • fDate
    18-21 Dec. 2013
  • Firstpage
    72
  • Lastpage
    75
  • Abstract
    A huge amount of association relationships among biological entities (e.g., diseases, drugs, and genes) are scattered in biomedical literature. How to extract and analyze such heterogeneous data still remains a challenging task for most researchers in the biomedical field. Natural language processing (NLP) has the potential in extracting associations among biological entities from literature. However, association information extracted through NLP can be large, noisy, and redundant which poses significant challenges to biomedical researchers to use such information. To address this challenge, we propose a computational framework to facilitate the use of NLP results. We apply Latent Dirichlet Allocation (LDA) to discover topics based on associations. The networks extracted from each topic provide a disease-specific network for downstream bioinformatics analysis of associations for each topic. We illustrated the framework through the construction of disease-specific networks from Semantic MEDLINE, an NLP-generated association database, followed by the analysis of network properties, such as hub nodes and degree distribution. The results demonstrate that (1) LDA-based approach can group related diseases into the same disease topic; (2) the disease-specific association network follows the scale-free network property, in which hub nodes are enriched in related diseases, genes and drugs.
  • Keywords
    bioinformatics; data analysis; diseases; drugs; genetics; medical information systems; natural language processing; semantic networks; NLP-generated association database; PubMed literature information; bioinformatics analysis; biological entities; biomedical field; biomedical literature; disease-specific association network; drugs; genes; heterogeneous data analysis; heterogeneous data extraction; hub nodes; integrative computational approach; latent Dirichlet allocation approach; natural language processing; scale-free network property; semantic MEDLINE; Abstracts; Biology; Data mining; Diseases; Drugs; Resource management; Semantics; Disease-specific network; Latent Dirichlet Allocation; Network Analysis; Semantic MEDLINE;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
  • Conference_Location
    Shanghai
  • Type

    conf

  • DOI
    10.1109/BIBM.2013.6732738
  • Filename
    6732738