Title :
Modeling exome sequencing data with generalized Gaussian distribution with application to copy number variation detection
Author :
Junbo Duan ; Mingxi Wan ; Hong-Wen Deng ; Yu-Ping Wang
Author_Institution :
Dept. of Biomed. Eng., Xi´an Jiaotong Univ., Xi´an, China
Abstract :
Exome sequencing provides us an effective way to discover genetic factors that might be associated with phenotypes for complex diseases. Compared with the whole-genome sequencing, exome sequencing can satisfy the high sequencing coverage requirement while under the limited budge constraint. However, due to the nature that exons are distributed sparsely along the genome, and the technical variability between samples, the analysis of exome sequencing data is complicated and direct utilization of current whole-genome sequencing targeted methods yields wrong results. In this paper, we propose a novel model to represent the exome sequencing data. Under this model, we show that the technical variability as well as random sequencing error follow the generalized Gaussian distribution. Based on this observation, we propose a method to detect the copy number variation. Studies on real data from 1000 Genomes Projects validate the proposed algorithm.
Keywords :
Gaussian distribution; biology computing; genetics; genomics; 1000 Genomes Projects; complex diseases; copy number variation detection; exome sequencing data modeling; generalized Gaussian distribution; genetic factors; high sequencing coverage requirement; limited budge constraint; phenotypes; random sequencing error; whole-genome sequencing targeted methods; Bioinformatics; Data models; Gaussian distribution; Genomics; Optimization; Sequential analysis; Vectors; 1000 Genomes Project; Next generation sequencing; copy number variation; exome sequencing; generalized Gaussian distribution; iteratively reweighted least squares;
Conference_Titel :
Bioinformatics and Biomedicine (BIBM), 2013 IEEE International Conference on
Conference_Location :
Shanghai
DOI :
10.1109/BIBM.2013.6732619