The Dayhoff matrix computed by Dayhoff et. al. [9] was based on an insufficient number of matched amino acid pairs to sustain an analysis of substitution rates any more sophisticated than that implied by the Markov model. Today, it is a relatively easy (and computationally feasible) task to gather on the order of millions of amino acid matchings. One would typically need to perform a ``self-matching'' of only one entire database (such as Swiss-Prot) to gather a sufficient amount of data.17.2
The article Analysis of mutation during divergent evolution.,
Gonnet, Cohen and Benner, (1992) [16]
details the first exhaustive ``self-matching'' of the Swiss-Prot
vers. 23
database. At that time, Swiss-Prot consisted of approximately
27,000 sequences. New Dayhoff matrices were formed from the
frequency information contained in the
alignments remaining after inspecting each by hand and removing
suspect alignments (those thought not to be true or not due to
point mutations, insertions or deletion events).
ection - Estimating Mutation Matrices
describes their methodology. This section explores the Darwin
commands for building such matrices.