next up previous
Next: Independence of the position Up: Probability of an evolutionary Previous: Probability of an evolutionary

How to compute this probability efficiently.

Let O be the point which identifies the root of the EPT. To compute the probability of a given position, we use the same idea as Dayhoff et al. [6] i.e. sum over all possible unknown amino acids o, w, x and y and evaluate all the mutation probabilities.

\begin{displaymath}Pr\{ \mbox{EC} \} = \sum_o f_o
\sum_x M^{d_X}_{xo} M^{d_4}_{...
...M^{d_3}_{C y}
\sum_w M_{wy}^{d_W} M^{d_1}_{K w} M^{d_2}_{R w}
\end{displaymath}

The interpretation of each part of the product is simple. For each branch between $U \rightarrow V$ we will have a term of the form MdVv u, for each unknown internal amino acid at node U we will have a sum for all its possibilities ($\sum_u$) and for the root we have the sum of each possible amino acid times its natural frequency of occurrence. Lower case letters denote the particular amino acid at a given position, e.g. x is the amino acid in a given position of sequence X. The upper case K, R and C indicate the known amino acids at the leaves.

As written, this formula is very expensive to compute, it requires $\vert\Sigma\vert^{k-1}$ products, where k is the number of sequences. We can reorganize this summation by the introduction of new vectors TX and SX for each internal or external node X. The value of the vector depends on the position of the root of the tree and it is computed as follows

These recursive definitions can be used if we do the computation from the leaves towards the root. Each S corresponding to an internal node requires $\vert\Sigma\vert^2$multiplications, and each T requires $\vert\Sigma\vert$multiplications. For k sequences, each of length m, this requires $m(k-1)\vert\Sigma\vert(\vert\Sigma\vert+1)$multiplications (420m(k-1) for proteins), which is not inexpensive, but perfectly feasible.

The computation of the probability of the EC is done with the two immediate descendants of the root, call them X and Y.

\begin{displaymath}Pr\{ \mbox{EC} \} = \sum_i f_i
\left ( \sum_x M^{d_X}_{x i} T...
...^{d_Y}_{y i} T^Y_{y} \right ) =
\sum_i f_i T_i^O = f \cdot T^O
\end{displaymath}

The PAS at the root of the tree can be computed as follows: for each amino acid at the root compute the probability of such a configuration

\begin{displaymath}Pr\{ \mbox{aa $i$\space at the root} \} = \mbox{PAS}_i =
\lef...
...\right )
\left ( \sum_y M^{d_Y}_{y i} T^Y_{y} \right ) = T_i^O
\end{displaymath}

where X and Y are the descendants of the root. These are probabilities of the entire EC, to find the relative probabilities for each amino acid the PAS is normalized to sum 1, i.e.

\begin{displaymath}\mbox{PAS} = T^O / \sum_i T_i^O \end{displaymath}


next up previous
Next: Independence of the position Up: Probability of an evolutionary Previous: Probability of an evolutionary
Gaston Gonnet
1998-07-14