Next: Introduction
Optimal Scoring Matrices for Estimating Distances Between Aligned Sequences
Gaston Gonnet and Chantal Korostensky
e-mail: {gonnet, korosten}@inf.ethz.ch
Abstract:
Sequence alignment is typically the first step in many research
areas of bioinformatics, where some form of score or distance is
derived. Those scores and distances are often used for
evolutionary tree construction, multiple sequence alignments,
all-against-all comparisons of whole genomes and many other tasks.
Since those scores and distances are the basis for further
studies, it is important that they can be estimated as well as
possible. In this paper, we prove that the scores obtained from
Dayhoff matrices (or from any other matrix) are not consistent for
tree construction. Then we show how this can be corrected and how
to create an optimal scoring matrix to estimate distances. This
scoring matrix is optimal within a large class of estimators.
Finally we present a complete example.
Chantal Korostensky
1999-07-14