Next: Example Up: Evaluation Measures of Multiple Previous: Probability of an evolutionary

Simulation of Evolution

To illustrate how the scoring function can be used, a variety of tools for generating MSAs were challenged with a set of protein families simulated following a Markovian model of evolution, and the outputs of each evaluated using the CS measure. This provides, of course, only an approximate assessment of the MSA tools themselves. A better assessment must come with actual experimental sequence data. Random trees with a given structure and edge lengths and a random sequence at the root were generated. From this, sequence mutations, insertions and deletions of different sizes were introduced according to the length of the edges of the tree. At each internal node a new sequence was thus generated. At the end of the simulation, only the sequences at the leaves are retained. Since both the places of insertions and deletions, as well as the ``real'' tree are known, the correct MSA is known as well. The retained sequences at the leaves can be given to different algorithms (MSA[Gupta et al., 1996,Lipman et al., 1989], MAP[Huang, 1994], ClustalW[Higgins and Sharp, 1989,Thompson et al., 1994] and the Probabilistic model (PAS) [Gonnet and Benner, 1996,Gonnet, 1994a], and the score of the calculated MSAs can be compared to the score of the ``real'' (generated) MSA using the CS measure. The results for 3 combinations of trees and sizes are shown. These are representative of all other results (Figures 12-14). The circular order was always derived using a TSP algorithm, not with the generated tree. But since we have the correct tree, it was easy to verify that the TSP order is in fact a circular order (which was always the case).

Example

Next: Example Up: Evaluation Measures of Multiple Previous: Probability of an evolutionary

Chantal Korostensky
1999-07-14