|
The use of heuristics in constructing MSAs creates a third problem, one centering on evaluation. Before using an MSA or tree built by a heuristic, one would like to know approximately how closely the heuristic has approximated an optimum MSA or tree. Even today, it is common for biochemists to evaluate by eye (and adjust by hand) the output of MSA tools. This is a clearly inadequate approach for any systematic reconstruction of natural history using genomic sequence data. A formal method for judging the quality of an MSA is needed. Accordingly, a variety of groups have proposed or used scoring functions [Sankoff and Cedergren, 1983,Altschul, 1989,Thompson et al., 1994,Higgins and Sharp, 1989,Carillo and Lipman, 1988,Gupta et al., 1996] that assess the quality of an MSA. In this paper we are interested in MSAs when no tree is available. In this case, the most commonly used function follows a simple approach that examines every pair of proteins in the family, generates a score for each pairwise alignment using a Dayhoff matrix [Dayhoff et al., 1978], and creates a score for the MSA by summing each of the scores of the pairwise alignments. We shall call these ``sum of pairs'' (SP) methods.