next up previous
Next: Scores are inconsistent to Up: Dayhoff Scores and Evolutionary Previous: Dayhoff Scores and Evolutionary

Monotonicity of scores and distances

Naturally we would assume that a lower PAM distance corresponds to a higher score, i.e. a higher probability that the sequences are related. First we show that this assumption is true when the lengths of the sequences are the same. To verify if smaller PAM distances correspond to larger scores, we have to look at the expected value of the score SD(d) as a function of the PAM distance d. To do this we look at sequences with length n. The expected score SD(d) for each aligned amino acid is

\begin{displaymath}S_D(d) = \sum_{B=1}^{20} f_B \sum_{A=1}^{20} M_{AB}^d \cdot D_{AB,d} \end{displaymath}

where DAB,d is the score of amino acids A and B at PAM distance d and the probabilities are from equation 1. In our case, Dd is a Dayhoff matrix for PAM distance d. If we want the expected score for the whole sequences, we have to multiply this value by n. Calculating SD(d) for each d from 0 to 200, we get the plot in Figure 2.
  
Figure 2: The expected score S(d) as a function of the PAM distance d.
\begin{figure}
\begin{center}
\mbox{\psfig{file=scorepam.eps,height=0.25\textheight,angle=0} }
\end{center}
\end{figure}

The function is monotonic and decreasing, meaning that a larger PAM distance d does correspond to a lower score SD(d). The graph is only a "visual" proof. To prove this, we have to analyze the derivative of SD(d) with respect to d. To compute this derivative, Md can be rewritten as

\begin{displaymath}M^d = U \Lambda^d U^{-1} \end{displaymath}

where $\Lambda$ is a diagonal matrix containing the eigenvalues of M. In $\Lambda^d$ each diagonal element is $\Lambda_{ii}=\lambda_i^d$. For any matrix E, SE(d) can be rewritten as

 \begin{displaymath}
S_E(d) = \sum_{B=1}^{20} f_B \sum_{A=1}^{20} (U \Lambda^d
U^{-1})_{AB} \cdot E_{AB}
\end{displaymath} (2)

If we multiply everything out we can rewrite SE(d) as

 \begin{displaymath}
S_E(d) = \sum_{B=1}^{20} f_B \sum_{A=1}^{20} E_{AB}
\sum_...
... U^{-1}_{iB} \lambda_i^d = \sum_{i=1}^{20}
T_{ii} \lambda_i^d \end{displaymath} (3)

where Tii is the ith diagonal entry in the matrix T = U-1F ET U (F is a diagonal matrix with the frequencies Fii = fi). Its derivative is

\begin{displaymath}S'_E(d) = \sum_{i=1}^{20} T_{ii} \lambda_i^d \ln \lambda_i \end{displaymath}

and can be computed and verified to be negative for all d in the range 0 to 250, for the Dayhoff matrix in [9].
next up previous
Next: Scores are inconsistent to Up: Dayhoff Scores and Evolutionary Previous: Dayhoff Scores and Evolutionary
Chantal Korostensky
1999-07-14