Next: Bibliography Up: Optimal Scoring Matrices for Previous: Degrees of Freedom

Example

We use a random matrix M as an example. For simplicity we will assume that we have only 4 amino acids, i.e. M will be a $4\times 4$ matrix. M is generated to satisfy the normal properties: the columns should add to 1 and the pseudo-symmetry condition f_i M_ji = f_j M_ij must hold.

$\begin{displaymath}M=\left [\begin {array}{cccc} 0.9920& 0.007612& 0.001144& 0.0... ... } 0.002821& 0.003806& 0.003051& 0.9860\end {array} \right ] \end{displaymath}$

The frequency vector f can be computed from the eigenvector of M whose eigenvalue is 1 or from the first row/column of M [1].

$\begin{displaymath}f_i = \frac{ M_{i1}} {M_{1i} \sum_j M_{j1}/M_{1j} } \end{displaymath}$

$\begin{displaymath}f_1=.3003\;\;\;f_2=.1484\;\;\;f_3=.3702\;\;\;f_4=.1811\;\;\;\end{displaymath}$

The eigenvalues of M are

$\begin{displaymath}\lambda_1=1.0000\;\;\;\lambda_2=.9914\;\;\;\lambda_3=.9830\;\;\;\lambda_4=.9786\;\;\;\end{displaymath}$

In this case we want to find the optimal scoring matrix for d in the range $100 \leq d \leq 200$ . After setting E₁₁=-1 and E₁₂=1 and numerically minimizing $\int_{100}^{200}F(t) dt$ on the rest of the E_ij unknowns, we obtain:

$\begin{displaymath}E = \left [\begin {array}{cccc} - 1.0& 1.0& 3.522& 1.720 \\... ...{\medskip } 1.720& 1.331& 1.601& - 1.596\end {array}\right ] \end{displaymath}$

From these values, using equation 3, we can compute

S(d) = 1.258-.3307(.9786)^d-1.459(.9914)^d-.5098(.9830)^d

The derivative of S(d) with respect to d is:

S'(d) = .007136(.9786)^d+.01264(.9914)^d+.008734(.9830)^d

and we can see that all the terms are strictly positive, so S(d) is strictly monotonic for all d. Suppose we have aligned these two sequences:

$\begin{displaymath}\begin{array}{c} A_3A_2A_1A_2A_4A_3A_4A_2\\ A_3A_4A_1A_3A_1A_1A_2A_2\end{array} \end{displaymath}$

The score for this alignment is

w/n = (E₃₃+E₂₄+E₁₁+E₂₃+E₄₁+E₃₁+E₄₂+E₂₂)/8 =.8076

From this value, and using any numerical method, we can find the d^* which satisfies the equation

S(d^*) = .8076

The solution is

d^*=149.99

The variance of the estimator is $\frac{S^2(d)-S(d)^2}{nS^{\prime}(d)^2} \approx 16215.0$ . The variance is very large, and so is the standard deviation (127.34), which is not unexpected for such a short alignment.

Next: Bibliography Up: Optimal Scoring Matrices for Previous: Degrees of Freedom

Chantal Korostensky
1999-07-14