next up previous
Next: Bibliography Up: Optimal Scoring Matrices for Previous: Degrees of Freedom

Example

We use a random matrix M as an example. For simplicity we will assume that we have only 4 amino acids, i.e. M will be a $4\times 4$ matrix. M is generated to satisfy the normal properties: the columns should add to 1 and the pseudo-symmetry condition fi Mji = fj Mij must hold.

\begin{displaymath}M=\left [\begin {array}{cccc} 0.9920& 0.007612& 0.001144& 0.0...
... } 0.002821& 0.003806& 0.003051& 0.9860\end {array}
\right ]
\end{displaymath}

The frequency vector f can be computed from the eigenvector of M whose eigenvalue is 1 or from the first row/column of M [1].

\begin{displaymath}f_i = \frac{ M_{i1}} {M_{1i} \sum_j M_{j1}/M_{1j} } \end{displaymath}


\begin{displaymath}f_1=.3003\;\;\;f_2=.1484\;\;\;f_3=.3702\;\;\;f_4=.1811\;\;\;\end{displaymath}

The eigenvalues of M are

\begin{displaymath}\lambda_1=1.0000\;\;\;\lambda_2=.9914\;\;\;\lambda_3=.9830\;\;\;\lambda_4=.9786\;\;\;\end{displaymath}

In this case we want to find the optimal scoring matrix for d in the range $100 \leq d \leq 200$. After setting E11=-1 and E12=1 and numerically minimizing $\int_{100}^{200}F(t) dt$ on the rest of the Eij unknowns, we obtain:

\begin{displaymath}E =
\left [\begin {array}{cccc} - 1.0& 1.0& 3.522& 1.720
\\...
...{\medskip } 1.720& 1.331& 1.601&
- 1.596\end {array}\right ]
\end{displaymath}

From these values, using equation 3, we can compute

S(d) = 1.258-.3307(.9786)d-1.459(.9914)d-.5098(.9830)d

The derivative of S(d) with respect to d is:

S'(d) = .007136(.9786)d+.01264(.9914)d+.008734(.9830)d

and we can see that all the terms are strictly positive, so S(d) is strictly monotonic for all d. Suppose we have aligned these two sequences:

\begin{displaymath}\begin{array}{c}
A_3A_2A_1A_2A_4A_3A_4A_2\\
A_3A_4A_1A_3A_1A_1A_2A_2\end{array} \end{displaymath}

The score for this alignment is

w/n = (E33+E24+E11+E23+E41+E31+E42+E22)/8 =.8076

From this value, and using any numerical method, we can find the d* which satisfies the equation

S(d*) = .8076

The solution is

d*=149.99

The variance of the estimator is $\frac{S^2(d)-S(d)^2}{nS^{\prime}(d)^2} \approx 16215.0$. The variance is very large, and so is the standard deviation (127.34), which is not unexpected for such a short alignment.
next up previous
Next: Bibliography Up: Optimal Scoring Matrices for Previous: Degrees of Freedom
Chantal Korostensky
1999-07-14