• DocumentCode
    3573982
  • Title

    An MDL analysis framework for eQTL data

  • Author

    Chalkidis, Georgios ; Sugano, Sumio

  • Author_Institution
    Dept. of Med. Genome Sci., Univ. of Tokyo, Tokyo, Japan
  • fYear
    2014
  • Firstpage
    1
  • Lastpage
    7
  • Abstract
    Rapid development of genome sequencing technologies enables novel insights into the mechanisms of complex disease through Big Data analysis. Physicians can nowadays assay a patient´s gene variants and gene expression patterns in a timely manner and use the obtained data to study an individual´s susceptibility to complex disease and unravel the underlying mechanisms of disease pathogenesis. Massive amounts of correlated genotype, gene expression, and clinical data are collected in eQTL datasets. In this work, we propose an analysis framework based on the minimum description length principle for extracting useful information from eQTL data. This is achieved by minimizing the stochastic complexity of the data by using the universal normalized maximum likelihood code as the global code length optimization criterion. The algorithm simultaneously identifies disease associated features, extracts the optimal model of the complex disease, and estimates its parameters. Applied to a simulated eQTL dataset, our framework successfully reveals the underlying mechanisms of a hypothetical complex disease interaction network.
  • Keywords
    Big Data; data analysis; diseases; genomics; information retrieval; medical computing; optimisation; stochastic processes; Big Data analysis; MDL analysis framework; clinical data collection; complex disease mechanisms; correlated genotype; disease pathogenesis; eQTL datasets; expression quantitative trait loci data; gene expression patterns; genome sequencing technology; global code length optimization criterion; hypothetical complex disease interaction network; information extraction; minimum description length principle; patient gene variants; stochastic complexity; universal normalized maximum likelihood code; Bioinformatics; Biological system modeling; Complexity theory; Data models; Diseases; Gene expression; Genomics;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Engineering (APWC on CSE), 2014 Asia-Pacific World Congress on
  • Print_ISBN
    978-1-4799-1955-0
  • Type

    conf

  • DOI
    10.1109/APWCCSE.2014.7053841
  • Filename
    7053841