DocumentCode :
3123859
Title :
Information theory for DNA sequencing: Part I: A basic model
Author :
Motahari, Abolfazl ; Bresler, Guy ; Tse, David
Author_Institution :
Dept. of Electr. Eng. & Comput. Sci., Univ. of California at Berkeley, Berkeley, CA, USA
fYear :
2012
fDate :
1-6 July 2012
Firstpage :
2741
Lastpage :
2745
Abstract :
DNA sequencing is the basic workhorse of modern day biology and medicine. Shotgun sequencing is the dominant technique used: many randomly located short fragments called reads are extracted from the DNA sequence, and these reads are assembled to reconstruct the original sequence. By drawing an analogy between the DNA sequencing problem and the classic communication problem, we define an information theoretic notion of sequencing capacity. This is the maximum number of DNA base pairs that can be resolved reliably per read, and provides a fundamental limit to the performance that can be achieved by any assembly algorithm. We compute the sequencing capacity explicitly for a simple statistical model of the DNA sequence and the read process.
Keywords :
DNA; information theory; molecular biophysics; DNA base pair; DNA sequencing; read extraction; sequencing capacity; shotgun sequencing; statistical model; Algorithm design and analysis; Assembly; Bioinformatics; DNA; Decoding; Genomics; Greedy algorithms;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on
Conference_Location :
Cambridge, MA
ISSN :
2157-8095
Print_ISBN :
978-1-4673-2580-6
Electronic_ISBN :
2157-8095
Type :
conf
DOI :
10.1109/ISIT.2012.6284020
Filename :
6284020
Link To Document :
بازگشت