Monotonicity of scores and distances

Next: Scores are inconsistent to Up: Dayhoff Scores and Evolutionary Previous: Dayhoff Scores and Evolutionary

Monotonicity of scores and distances

Naturally we would assume that a lower PAM distance corresponds to a higher score, i.e. a higher probability that the sequences are related. First we show that this assumption is true when the lengths of the sequences are the same. To verify if smaller PAM distances correspond to larger scores, we have to look at the expected value of the score S_D(d) as a function of the PAM distance d. To do this we look at sequences with length n. The expected score S_D(d) for each aligned amino acid is

$\begin{displaymath}S_D(d) = \sum_{B=1}^{20} f_B \sum_{A=1}^{20} M_{AB}^d \cdot D_{AB,d} \end{displaymath}$

where D_AB,d is the score of amino acids A and B at PAM distance d and the probabilities are from equation 1. In our case, D_d is a Dayhoff matrix for PAM distance d. If we want the expected score for the whole sequences, we have to multiply this value by n. Calculating S_D(d) for each d from 0 to 200, we get the plot in Figure 2.

**Figure 2:** The expected score S(d) as a function of the PAM distance d.
$\begin{figure} \begin{center} \mbox{\psfig{file=scorepam.eps,height=0.25\textheight,angle=0} } \end{center} \end{figure}$

The function is monotonic and decreasing, meaning that a larger PAM distance d does correspond to a lower score S_D(d). The graph is only a "visual" proof. To prove this, we have to analyze the derivative of S_D(d) with respect to d. To compute this derivative, M^d can be rewritten as

$\begin{displaymath}M^d = U \Lambda^d U^{-1} \end{displaymath}$

where $\Lambda$ is a diagonal matrix containing the eigenvalues of M. In $\Lambda^d$ each diagonal element is $\Lambda_{ii}=\lambda_i^d$ . For any matrix E, S_E(d) can be rewritten as

$\begin{displaymath} S_E(d) = \sum_{B=1}^{20} f_B \sum_{A=1}^{20} (U \Lambda^d U^{-1})_{AB} \cdot E_{AB} \end{displaymath}$

(2)

If we multiply everything out we can rewrite S_E(d) as

$\begin{displaymath} S_E(d) = \sum_{B=1}^{20} f_B \sum_{A=1}^{20} E_{AB} \sum_... ... U^{-1}_{iB} \lambda_i^d = \sum_{i=1}^{20} T_{ii} \lambda_i^d \end{displaymath}$

(3)

where T_ii is the i^th diagonal entry in the matrix T = U^-1F E^T U (F is a diagonal matrix with the frequencies F_ii = f_i). Its derivative is

$\begin{displaymath}S'_E(d) = \sum_{i=1}^{20} T_{ii} \lambda_i^d \ln \lambda_i \end{displaymath}$

and can be computed and verified to be negative for all d in the range 0 to 250, for the Dayhoff matrix in [9].

Next: Scores are inconsistent to Up: Dayhoff Scores and Evolutionary Previous: Dayhoff Scores and Evolutionary

Chantal Korostensky
1999-07-14