DocumentCode
3573982
Title
An MDL analysis framework for eQTL data
Author
Chalkidis, Georgios ; Sugano, Sumio
Author_Institution
Dept. of Med. Genome Sci., Univ. of Tokyo, Tokyo, Japan
fYear
2014
Firstpage
1
Lastpage
7
Abstract
Rapid development of genome sequencing technologies enables novel insights into the mechanisms of complex disease through Big Data analysis. Physicians can nowadays assay a patient´s gene variants and gene expression patterns in a timely manner and use the obtained data to study an individual´s susceptibility to complex disease and unravel the underlying mechanisms of disease pathogenesis. Massive amounts of correlated genotype, gene expression, and clinical data are collected in eQTL datasets. In this work, we propose an analysis framework based on the minimum description length principle for extracting useful information from eQTL data. This is achieved by minimizing the stochastic complexity of the data by using the universal normalized maximum likelihood code as the global code length optimization criterion. The algorithm simultaneously identifies disease associated features, extracts the optimal model of the complex disease, and estimates its parameters. Applied to a simulated eQTL dataset, our framework successfully reveals the underlying mechanisms of a hypothetical complex disease interaction network.
Keywords
Big Data; data analysis; diseases; genomics; information retrieval; medical computing; optimisation; stochastic processes; Big Data analysis; MDL analysis framework; clinical data collection; complex disease mechanisms; correlated genotype; disease pathogenesis; eQTL datasets; expression quantitative trait loci data; gene expression patterns; genome sequencing technology; global code length optimization criterion; hypothetical complex disease interaction network; information extraction; minimum description length principle; patient gene variants; stochastic complexity; universal normalized maximum likelihood code; Bioinformatics; Biological system modeling; Complexity theory; Data models; Diseases; Gene expression; Genomics;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Engineering (APWC on CSE), 2014 Asia-Pacific World Congress on
Print_ISBN
978-1-4799-1955-0
Type
conf
DOI
10.1109/APWCCSE.2014.7053841
Filename
7053841
Link To Document