Title :
Workshop: Bioinformatics pipeline for fosmid based molecular haplotype sequencing
Author :
Duitama, Jorge ; Suk, Eun-Kyung ; Schulz, Sabrina ; McEwen, Gayle ; Huebsch, Thomas ; Hoehe, Margret
Author_Institution :
Genetic Variation, Haplotypes & Genetics of Complex Disease, Max Planck Inst. for Mol. Genetics, Berlin, Germany
Abstract :
A new bioinformatics pipeline for fosmid based analysis was developed by extending the standard SOLiD pipeline for NGS. The experimental approach starts by sequencing pools of up to 15000 DNA molecules called fosmids. Each fosmid has an average length of 40kb and is sampled at random from the genome. The pipeline includes an algorithm for fosmids detection which clusters SOLiD reads aligned to the reference genome based on a custom made set of proximity rules. It also includes a module to make homozygous allele calling on regions identified as potential fosmid locations. These allele calls are collected in a matrix for single individual haplotyping. The pipeline includes a new algorithm for this bioinformatics problem which tries to find the cut of fosmids consistent with their haplotype origin. The algorithm reduces the problem to the well known NP-Complete problem called Max-CUT which was approximately solved by combining well known heuristics. Finally, the algorithm calculates the consensus haplotypes assuming that the cut is correct. After running the pipeline on 48 different pools, 32347 SNPs in 102 blocks on chromosome 22 of an individual with a predicted switch error rate of about 1% were phased.
Keywords :
DNA; bioinformatics; cellular biophysics; computational complexity; genomics; molecular biophysics; molecular configurations; optimisation; polymorphism; DNA molecules; NP-complete problem; SNP; SOLiD pipeline; bioinformatics; chromosome; fosmid; genome; homozygous allele; max-CUT; molecular haplotype sequencing; proximity rules; switch error; Approximation algorithms; Bioinformatics; Biological cells; DNA; Genomics; Pipelines;
Conference_Titel :
Computational Advances in Bio and Medical Sciences (ICCABS), 2011 IEEE 1st International Conference on
Conference_Location :
Orlando, FL
Print_ISBN :
978-1-61284-851-8
DOI :
10.1109/ICCABS.2011.5729923