next up previous contents
Next: Modeling Evolution Up: Darwin and Problems from Biochemistry Previous: Darwin and Problems from Biochemistry

   
Point Accepted Mutations and Dayhoff Matrices

This chapter is mainly concerned with the definition and construction of matrices which are used to score the quality of an alignment of two amino acid sequences. Typically, these similarity matrices contain a value proportional to the probability that amino acid i mutates into amino acid j for all pairs of amino acids.

The construction of such matrices is straightforward and natural. By examining a large sample of verified pairwise alignments of amino acids, we can extract frequency information of the form

amino acid i mutated into amino acid j
If the sample is large enough to be statistically significant and contains a diverse range of example alignments, then the resulting matrices should reflect the true probabilities of mutations occuring through a period of evolution.

An alignment between two amino acid sequences might look as follows

\begin{displaymath}\begin{array}{lcccccc}
sequence 1: & A_1 & A_2 & A_3 & A_4 & ...
...\
sequence 2: & B_1 & B_2 & B_3 & B_4 & B_5 & B_6
\end{array} \end{displaymath}

where Ai is an amino acid from sequence 1 aligned against amino acid Bi in sequence 2. Each such matching between two amino acids is assigned a score from a similarity matrix. The score for the entire match is the sum of the scores of the individually matched amino acids. Here, one typically uses the Dayhoff similarity matrix [9] since the entries have some nice algebraic properties which are exploited by the algorithms.

The best (or maximum likely) alignment between two sequences is that alignment which is most probable, ie. that alignment with the highest score relative to the Dayhoff matrix. Via the classic Needleman and Wunsch [23] dynamic programming algorithm, we are able to find this maximum quickly and efficiently. The construction of such alignments is explored in depth in Chapter [*] - The Pairwise Alignment of Sequences.

The following sections provide part of the ground work necessary for performing pairwise alignments. We begin by presenting our mathematical model of evolution and measures for the amount of evolution. We discuss the routines available in Darwin for the construction of the first Dayhoff matrix [9] and explain how their method can be improved upon.

In this chapter we are solely concerned with the mutation events which are point accepted mutations or, as they are sometimes referred to in the literature, substitutions. Chapter [*] Insertions and Deletions describes our model for the two other forms of mutation: insertions and deletions.



 
next up previous contents
Next: Modeling Evolution Up: Darwin and Problems from Biochemistry Previous: Darwin and Problems from Biochemistry
Gaston Gonnet
1998-09-15