next up previous contents
Next: The CreateDayMatrix Function Up: Better Dayhoff Matrices Previous: Better Dayhoff Matrices

The CreateDayMatrices Function

For various purposes, including the estimation of PAM distances between two alinged sequences, it is neceassry to use an array of Dayhoff matrices computed for a range of PAM distances. Chapter [*] - The Pairwise Comparison of Amino Acid Sequences discusses this estimation in greater depth. The Darwin function CreateDayMatrices computes such a range of ``enhanced'' Dayhoff matrices.






Calling Sequences:
CreateDayMatrices()

Returns: DayMatrix

Synposis: This function performs the following three calculations:
(1) It assigns a Dayhoff matrix computed at PAM distance 250 to the system variable DM.
(2) It computes 1266 Dayhoff matrices for various PAM distances between 0.049 and 1000 and assigns the list of such matrices to the system variable DMS.





The PAM distances for these matrices are not restricted to integers. The CreateDayMatrices function produces a large number of matrices at low PAM.

> CreateDayMatrices();
> for i from 1 to length(DMS) do
>   printf(' %5.5f', DMS[i][PamNumber]);  
> od;
0.04945 0.05055 0.05167 0.05282 0.05399 0.05519 0.05642 
0.05767 0.05896 0.06027 0.06161 0.06297 0.06437 0.06580 
0.06727 0.06876 0.07029 0.07185 0.07345 0.07508 0.07675 
0.07845 0.08020 0.08198 0.08380 0.08566 0.08757 0.08951 
0.09150 0.09354 0.09561 0.09774 0.09991 0.10213 0.10440 
0.10672 0.10909 0.11152 0.11400 0.11653 0.11912 0.12176 
0.12447 0.12724 0.13006 0.13295 0.13591 0.13893 0.14202 
0.14517 0.14840 0.15170 0.15507 0.15851 0.16204 0.16564 
0.16932 0.17308 0.17693 0.18086 0.18488 0.18899 0.19319 
0.19748 0.20187 0.20635 0.21094 0.21563 0.22042 0.22532 
0.23032 0.23544 0.24067 0.24602 0.25149 0.25708 0.26279 
0.26863 0.27460 0.28070 0.28694 0.29332 0.29983 0.30650 
...

Comparing the 250-PAM original Dayhoff matrix with the 250-PAM ``enhanced'' Dayhoff matrix reveals that, although similar, the matrices have significant differences.

> OrigTot := [87, 41, 40, 47, 33, 38, 50, 89, 34, 37, 
>             85, 81, 15, 40, 51, 70, 58, 10, 30, 65];
> OrigFreq := OrigTot/sum(OrigTot);
> OrigDM := CreateOrigDayMatrix(Mutations1978, OrigFreq, 250);
> print(OrigDM);
DayMatrix(Peptide, pam=250, Sim: max=17.302, min=-7.510, max offdiag=6.951,
 del=-19.814-1.396*(k-1))
C  12.0
S  -0.0  1.6
T  -2.2  1.3  2.6
P  -2.7  0.9  0.3  5.9
A  -2.0  1.1  1.2  1.1  1.8
G  -3.3  1.1 -0.0 -0.5  1.3  4.8
N  -3.6  0.7  0.4 -0.5  0.2  0.4  2.0
D  -5.1  0.3 -0.1 -1.0  0.3  0.6  2.1  3.9
E  -5.3 -0.0 -0.4 -0.6  0.3  0.2  1.4  3.4  3.9
Q  -5.3 -0.5 -0.8  0.2 -0.4 -1.2  0.8  1.6  2.5  4.1
H  -3.4 -0.8 -1.3 -0.3 -1.4 -2.1  1.6  0.7  0.6  2.9  6.6
R  -3.6 -0.3 -0.9 -0.2 -1.6 -2.6 -0.0 -1.3 -1.1  1.2  1.5  6.1
K  -5.4 -0.2 -0.0 -1.2 -1.2 -1.7  1.0  0.1 -0.1  0.7 -0.1  3.4  4.7
M  -5.2 -1.6 -0.6 -2.1 -1.2 -2.8 -1.8 -2.6 -2.2 -1.0 -2.2 -0.5  0.4  6.6
I  -2.3 -1.4  0.1 -2.0 -0.5 -2.6 -1.8 -2.4 -2.0 -2.0 -2.5 -2.0 -1.9  2.2  4.6
L  -6.0 -2.8 -1.7 -2.6 -1.9 -4.0 -2.9 -4.0 -3.3 -1.8 -2.1 -3.0 -2.9  3.7  2.4  6.0
V  -1.9 -1.0  0.3 -1.2  0.2 -1.4 -1.8 -2.2 -1.8 -1.9 -2.3 -2.5 -2.5  1.8  3.7  1.8  4.3
F  -4.3 -3.2 -3.1 -4.6 -3.5 -4.8 -3.5 -5.6 -5.4 -4.7 -1.8 -4.5 -5.3  0.2  1.0  1.8 -1.2  9.1
Y   0.4 -2.8 -2.8 -5.0 -3.5 -5.3 -2.1 -4.3 -4.3 -4.0 -0.1 -4.2 -4.5 -2.5 -1.0 -0.9 -2.5  7.0 10.2
W  -7.5 -2.3 -5.0 -5.5 -5.6 -6.8 -3.9 -6.6 -6.8 -4.6 -2.5  2.3 -3.3
-4.1 -5.0 -1.7 -6.1  0.5  0.0 17.3

> print(DM);
DayMatrix(Peptide, pam=250, Sim: max=14.152, min=-5.161, max offdiag=5.080,
 del=-19.814-1.396*(k-1))
C  11.5
S   0.1  2.2
T  -0.5  1.5  2.5
P  -3.1  0.4  0.1  7.6
A   0.5  1.1  0.6  0.3  2.4
G  -2.0  0.4 -1.1 -1.6  0.5  6.6
N  -1.8  0.9  0.5 -0.9 -0.3  0.4  3.8
D  -3.2  0.5 -0.0 -0.7 -0.3  0.1  2.2  4.7
E  -3.0  0.2 -0.1 -0.5 -0.0 -0.8  0.9  2.7  3.6
Q  -2.4  0.2  0.0 -0.2 -0.2 -1.0  0.7  0.9  1.7  2.7
H  -1.3 -0.2 -0.3 -1.1 -0.8 -1.4  1.2  0.4  0.4  1.2  6.0
R  -2.2 -0.2 -0.2 -0.9 -0.6 -1.0  0.3 -0.3  0.4  1.5  0.6  4.7
K  -2.8  0.1  0.1 -0.6 -0.4 -1.1  0.8  0.5  1.2  1.5  0.6  2.7  3.2
M  -0.9 -1.4 -0.6 -2.4 -0.7 -3.5 -2.2 -3.0 -2.0 -1.0 -1.3 -1.7 -1.4  4.3
I  -1.1 -1.8 -0.6 -2.6 -0.8 -4.5 -2.8 -3.8 -2.7 -1.9 -2.2 -2.4 -2.1  2.5  4.0
L  -1.5 -2.1 -1.3 -2.3 -1.2 -4.4 -3.0 -4.0 -2.8 -1.6 -1.9 -2.2 -2.1  2.8  2.8  4.0
V  -0.0 -1.0  0.0 -1.8  0.1 -3.3 -2.2 -2.9 -1.9 -1.5 -2.0 -2.0 -1.7  1.6  3.1  1.8  3.4
F  -0.8 -2.8 -2.2 -3.8 -2.3 -5.2 -3.1 -4.5 -3.9 -2.6 -0.1 -3.2 -3.3  1.6  1.0  2.0  0.1  7.0
Y  -0.5 -1.9 -1.9 -3.1 -2.2 -4.0 -1.4 -2.8 -2.7 -1.7  2.2 -1.8 -2.1 -0.2 -0.7 -0.0 -1.1  5.1  7.8
W  -1.0 -3.3 -3.5 -5.0 -3.6 -4.0 -3.6 -5.2 -4.3 -2.7 -0.8 -1.6 -3.5 -1.0 -1.8 -0.7 -2.6  3.6  4.1 14.2

To extract an entry of DMS with a particular PAM distance, we use the function.
SearchDayMatrix(p : real, D : array(DayMatrix))
where p is the target PAM distance and D is the array of Dayhoff matrices.

> print(SearchDayMatrix(250, DMS));
DayMatrix(Peptide, pam=250, Sim: max=14.152, min=-5.161, 
 max offdiag=5.080, del=-19.814-1.396*(k-1))
...
It is equivalent to the matrix assigned to variable DM.






The CreateDayMatrices function must be called before you attempt to perform a match between two sequences. Unless the memory requirements of your system are extremely limited, it is best to get into the habit of always calling this function when you begin your new session.





It is interesting to examine how these matrices change from low PAM distance where the matrix is positive on the diagonal and very negative in the off-diagonal to high PAM distance where the entries smooth out.

> print(DMS[1]);
DayMatrix(Peptide, pam=0.0494497, Sim: max=18.839, min=-50.808,
 max offdiag=-24.377, del=-47.348-1.396*(k-1))
C  17.2
S -31.5 12.2
T -34.6-25.7 12.1
P -46.5-31.6-32.5 13.4
A -31.1-27.3-30.5-31.8 11.1
G -38.3-31.8-38.4-37.9-31.3 11.3
N -37.2-28.6-30.6-37.1-35.4-32.2 13.4
D -45.3-31.8-33.1-35.8-34.3-33.5-27.0 12.7
E -48.7-32.4-33.8-34.7-31.7-36.8-32.6-25.8 12.3
Q -41.8-31.4-32.0-32.7-32.4-35.9-30.4-31.8-26.2 14.2
H -35.2-33.3-32.8-35.8-35.2-37.1-28.4-32.5-32.4-27.3 16.3
R -37.3-33.5-33.7-36.2-34.8-35.5-33.4-37.5-34.1-28.2-31.3 12.7
K -46.6-32.6-31.4-34.8-33.9-37.0-29.7-33.5-29.0-27.2-31.6-25.6 12.3
M -34.6-34.7-32.3-44.2-32.4-40.7-38.4-45.9-36.7-30.8-34.2-37.3-33.7 16.4
I -38.6-39.3-33.4-40.6-38.1-47.8-38.6-46.9-40.3-38.9-38.4-40.0-37.3-26.4 12.4
L -37.6-38.8-37.0-36.9-35.4-44.1-40.8-48.5-40.5-34.0-37.5-37.1-37.4-25.9-27.3 10.3
V -32.4-37.4-30.1-37.7-29.7-42.6-40.3-47.1-35.7-37.0-39.6-37.8-37.2-30.6-24.4-30.1 11.6
F -35.0-41.1-37.8-42.6-38.2-45.3-39.5-46.2-45.6-39.4-33.6-44.2-43.4-29.5-32.5-29.3-34.8 13.9
Y -34.1-35.0-37.8-39.7-38.8-42.7-34.6-38.9-40.2-37.2-27.2-36.1-37.7-35.7-36.8-35.5-36.0-24.7 14.8
W
-35.5-38.4-41.7-45.2-41.6-39.6-41.4-50.8-42.6-37.1-35.9-34.5-43.7-35.9-38.4-35.9-41.7-28.9-28.2
18.8
    C    S    T    P    A    G    N    D    E    Q    H    R    K    M
I    L   V     F     Y   W

> print(DMS[length(DMS)]);
DayMatrix(Peptide, pam=1000, Sim: max=3.184, min=-0.454, max offdiag=0.882,
 del=-15.338-1.396*(k-1))
C  0.9
S -0.0  0.0
T -0.0  0.0  0.0
P -0.1  0.1  0.0  0.3
A  0.0  0.0  0.0  0.0  0.0
G -0.1  0.1  0.0  0.1  0.1  0.5
N -0.1  0.1  0.0  0.1  0.0  0.1  0.1
D -0.1  0.1  0.0  0.1  0.0  0.2  0.1  0.2
E -0.1  0.1  0.0  0.1  0.0  0.1  0.1  0.1  0.1
Q -0.1  0.0  0.0  0.0  0.0  0.1  0.1  0.1  0.1  0.1
H -0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
R -0.1  0.0  0.0  0.0  0.0  0.1  0.1  0.1  0.1  0.1  0.0  0.1
K -0.1  0.0  0.0  0.1  0.0  0.1  0.1  0.1  0.1  0.1  0.0  0.1  0.1
M  0.0 -0.1 -0.0 -0.1 -0.0 -0.2 -0.1 -0.2 -0.1 -0.1 -0.0 -0.1 -0.1 0.2
I  0.0 -0.1 -0.0 -0.1 -0.0 -0.3 -0.1 -0.2 -0.1 -0.1 -0.1 -0.1 -0.1 0.2 0.2
L  0.0 -0.1 -0.0 -0.1 -0.1 -0.3 -0.1 -0.2 -0.2 -0.1 -0.1 -0.1 -0.1 0.2 0.3 0.3
V  0.1 -0.1 -0.0 -0.1 -0.0 -0.2 -0.1 -0.1 -0.1 -0.1 -0.0 -0.1 -0.1 0.1 0.2 0.2 0.1
F  0.1 -0.2 -0.1 -0.2 -0.1 -0.4 -0.2 -0.3 -0.2 -0.1 -0.0 -0.2 -0.2 0.2 0.3 0.3 0.2 0.6
Y  0.1 -0.1 -0.1 -0.2 -0.1 -0.3 -0.1 -0.2 -0.2 -0.1  0.0 -0.1 -0.1 0.2 0.2 0.2 0.1 0.5 0.5
W  0.1 -0.2 -0.2 -0.4 -0.2 -0.5 -0.3 -0.4 -0.3 -0.2  0.0 -0.2 -0.3 0.2 0.2 0.2 0.1 0.9 0.9 3.2
   C    S    T    P    A    G    N    D    E    Q    H    R    K   M  I   L   V   F   Y   W


next up previous contents
Next: The CreateDayMatrix Function Up: Better Dayhoff Matrices Previous: Better Dayhoff Matrices
Gaston Gonnet
1998-09-15