Next: Modeling Evolution
Up: Darwin and Problems from Biochemistry
Previous: Darwin and Problems from Biochemistry
Point Accepted Mutations and Dayhoff Matrices
This chapter is mainly concerned with the definition and construction
of matrices which are used to score the quality of an alignment of two
amino acid sequences. Typically, these similarity matrices contain a value
proportional to the probability that amino acid i mutates into
amino acid j for all pairs of amino acids.
The construction of such matrices is straightforward and natural. By
examining a large sample of verified pairwise alignments of amino
acids, we can extract frequency information of the form
amino acid i mutated into amino acid j
If the sample is large enough to be statistically significant and
contains a diverse range of example alignments, then the resulting
matrices should reflect the true probabilities of mutations occuring
through a period of evolution.
An alignment between two amino acid sequences might look as follows
where Ai is an amino acid from sequence 1 aligned against amino
acid Bi in sequence 2. Each such matching between two amino acids
is assigned a score from a similarity matrix. The score for the entire match is the sum
of the scores of the individually matched amino acids. Here, one
typically uses the Dayhoff similarity matrix
[9] since the entries have some nice algebraic
properties which are exploited by the algorithms.
The best (or maximum likely) alignment between two sequences is
that alignment which is most probable, ie. that alignment with the
highest score relative to the Dayhoff matrix. Via the classic Needleman and
Wunsch [23] dynamic programming algorithm, we are
able to find this maximum quickly and efficiently.
The construction of such alignments is explored in depth in
Chapter - The Pairwise Alignment of Sequences.
The following sections provide part of the ground work necessary for
performing pairwise alignments. We begin by presenting our
mathematical model of evolution and measures for the amount of
evolution. We discuss the routines available in Darwin for the
construction of the first Dayhoff matrix [9] and
explain how their method can be improved upon.
In this chapter we are solely concerned with the mutation events which
are point accepted mutations or, as they are sometimes referred
to in the literature, substitutions. Chapter
Insertions and Deletions describes our model for the two other
forms of mutation: insertions and deletions.
Next: Modeling Evolution
Up: Darwin and Problems from Biochemistry
Previous: Darwin and Problems from Biochemistry
Gaston Gonnet
1998-09-15