Title :
Haplotype assembly: An information theoretic view
Author :
Hongbo Si ; Vikalo, Haris ; Vishwanath, Sriram
Author_Institution :
Dept. of Electr. & Comput. Eng., Univ. of Texas at Austin, Austin, TX, USA
Abstract :
This paper studies the haplotype assembly problem from an information-theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the corresponding positions on the other chromosome in a homologous pair. Information about the order of bases in a genome is readily inferred using short reads provided by high-throughput DNA sequencing technologies. Associating reads that cover variant positions with specific chromosomes in a homologous pairs, which enables haplotype assembly, is challenging due to limited lengths of the reads and presence of sequencing errors. In this paper, the recovery of the target pair of haplotype sequences using short reads is rephrased as a joint source-channel coding problem. Two messages, representing haplotypes and chromosome memberships of reads, are encoded and transmitted over a channel with erasures and errors, where the channel model reflects salient features of high-throughput sequencing. The focus of this paper is on determining the required number of reads for reliable haplotype reconstruction, and both the necessary and sufficient conditions are presented with order-wise optimal bounds.
Keywords :
DNA; cellular biophysics; genomics; molecular biophysics; molecular configurations; binary string; channel model; chromosome; genome; haplotype assembly problem; haplotype reconstruction; haplotype sequences; high-throughput DNA sequencing technology; homologous pair; information-theoretic perspective; joint source-channel coding problem; nucleotide sequence; Assembly; Bioinformatics; Biological cells; Decoding; Noise; Reliability; Sequential analysis;
Conference_Titel :
Information Theory Workshop (ITW), 2014 IEEE
Conference_Location :
Hobart, TAS
DOI :
10.1109/ITW.2014.6970817