Title :
Information theory for DNA sequencing: Part I: A basic model
Author :
Motahari, Abolfazl ; Bresler, Guy ; Tse, David
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of California at Berkeley, Berkeley, CA, USA
Abstract :
DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process.
Keywords :
DNA; information theory; molecular biophysics; DNA base pair; DNA sequencing; read extraction; sequencing capacity; shotgun sequencing; statistical model; Algorithm design and analysis; Assembly; Bioinformatics; DNA; Decoding; Genomics; Greedy algorithms;
Conference_Titel :
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
Conference_Location :
Cambridge, MA
Print_ISBN :
978-1-4673-2580-6
Electronic_ISBN :
2157-8095
DOI :
10.1109/ISIT.2012.6284020