Title :
More on reconstructing strings from random traces: insertions and deletions
Author :
Kannan, Sampath ; McGregor, Andrew
Author_Institution :
Dept. Comput. & Inf. Sci., Pennsylvania Univ., Philadelphia, PA
Abstract :
We are given a collection of m received strings or traces that have been independently generated by randomly inserting and deleting bits from a common string t of length n. Our goal is to reconstruct the string t from these observed traces. This paper considers both the algorithms for doing this reconstruction and seeks to understand the error rates at which reconstruction is possible. Note the difference from the typical coding theory scenario rather than trying to infer a codeword from a single received word, we are interested in inferring an arbitrary (or near arbitrary) word from multiple, independently generated received words. We present two main results. Firstly we show that for almost all transmitted strings, if the deletion/insertion error probability is O(1/log2 n) then with m = O(log n) traces we can exactly reconstruct the transmitted string with high probability. Furthermore we can still reconstruct in the presence of additional noise that flips each bit with constant probability. Secondly, for arbitrary strings (with no run of length > nepsi) we show that with a constant number of received strings we can reconstruct when the deletion/insertion probability is O(1/n1/2+epsi). This paper continues work initiated in Batu et. al. (2004) which considered only deletion errors. Our setting can be viewed as the study of an idealized biological evolutionary process where the DNA string undergoes point mutations, deletions and insertions. Our goal is to understand at what mutation rates, a small number of observed samples can be correctly aligned to reconstruct the parent string
Keywords :
encoding; error statistics; DNA string; arbitrary strings; biological evolutionary process; coding theory; error probability; random traces; reconstructing strings; Binary codes; Biological information theory; DNA; Decoding; Error analysis; Error correction codes; Error probability; Genetic mutations; Information science;
Conference_Titel :
Information Theory, 2005. ISIT 2005. Proceedings. International Symposium on
Conference_Location :
Adelaide, SA
Print_ISBN :
0-7803-9151-9
DOI :
10.1109/ISIT.2005.1523342