Probability of an evolutionary tree configuration

If we knew the sequences of the ancestral proteins at the internal nodes of the tree, the probability of the entire tree is:

(4) |

where is interpreted as the probability of mutating from the amino acid in the node X to the amino acid in node Y according to the distance of the edge X-Y. The individual probabilities would be obtained by aligning the protein sequences at the beginning and end of each episode of evolution and scoring them using a Dayhoff matrix (for example) as described above. Normally, of course, the sequences for the ancestor proteins A, B and C are not known, as the organisms that contained them have long since died. In [Gonnet and Benner, 1996], a formula for the probability of an entire evolutionary configuration (the ensemble of the phylogenetic tree and the corresponding MSA) is derived. It corresponds exactly to the notion of computing the probability of traversing each edge of the tree. We can compute the probability of the tree configuration by adding the probabilities of all episodes represented by each of the edges.

Hence a tree scoring function