Author/Authors :
Baake، نويسنده , , Ellen، نويسنده ,
Abstract :
We address questions of identifiability in molecular phylogeny, the art of reconstructing the history of a sample of sequences given just the sequences at the leaves of the phylogenetic tree. Here, the `historyʹ consists of the tree topology, plus the transition probabilites which define the Markov process of sequence evolution along the branches of the tree. It is assumed that sequences have infinite length, and the pairwise joint distributions of letters at the leaves is taken to be known. We focus on two cases: (1) If the sites of a sequence evolve identically and independently, the topology can be reconstructed, but the one-way edge transition matrices cannot. However, the return-trip transition matrices are reconstructible for every edge, up to conjugacy in the case of internal edges. (2) If a rate factor varies from site to site, different topologies may produce identical pairwise joint distributions, even under the same distribution of rate factors. Consequently, identifiability of the topology is lost on the basis of pairwise sequence comparisons, even if the distribution of rate factors is known. The results are discussed in the context of additive measures of phylogenetic distance.
Keywords :
Phylogenetic reconstruction , Additive distances , identifiability , Markov processes on trees , rate heterogeneity