next up previous
Next: Introduction

Optimal Scoring Matrices for Estimating Distances Between Aligned Sequences

Gaston Gonnet and Chantal Korostensky
e-mail: {gonnet, korosten}


Sequence alignment is typically the first step in many research areas of bioinformatics, where some form of score or distance is derived. Those scores and distances are often used for evolutionary tree construction, multiple sequence alignments, all-against-all comparisons of whole genomes and many other tasks. Since those scores and distances are the basis for further studies, it is important that they can be estimated as well as possible. In this paper, we prove that the scores obtained from Dayhoff matrices (or from any other matrix) are not consistent for tree construction. Then we show how this can be corrected and how to create an optimal scoring matrix to estimate distances. This scoring matrix is optimal within a large class of estimators. Finally we present a complete example.


Chantal Korostensky