The DynProg Function

Next: The BackDynProg Function Up: Dynamic Programming Previous: Dynamic Programming

The `DynProg` Function

We want to find the alignment between the following two amino acid sequences that yields the highest score. For this example, we will use a Dayhoff matrix computed at PAM 250.

> CreateDayMatrix(logPAM1, 250);     # compute the Dayhoff matrix DM
> x := 'TCIYGH';
> y := 'TWMRH';

The function DynProg performs dynamic programming in Darwin.

Calling Sequences:
DynProg( seq₁, seq₂, DM)
DynProg( seq₁, seq₂, DM, l₁, l₂)
Parameters:

seq₁, seq₂	:	`string`
DM	:	`DayMatrix`
l₁, l₂	:	`posint`

Returns: list - three elements (1) score, (2), length of match for sequence 1 and length of match for sequence 2.

Synposis: If no lengths are given as arguments ( l₁, l₂are not passed), the DynProg function computes the highest similarity score for the two sequences (or two subsequences of seq₁ and seq₂). This is referred to as the best global alignment.

If the lengths are given as arguments, then DynProg finds the alignment of length max(l₁, l₂) with maximum score. The alignment is guaranteed to use l₁ elements of seq₁ and l₂ of seq₂. This is referred to as the best local alignment.

We begin by tracing through an alignment of sequences x and y. First, we ask Darwin to compute the maximum local alignment (we omit the lengths of x and y.

> DynProg(x, y, DM);  
[3.9785, 3, 3]

The score of the alignment is 18.0274 and it uses only 3 amino acid from both x and y. Although DynProg does not gives us the alignment, we can deduce easily what matches occured. The best way to visualize the alignment process is to build a matrix of dimension length(x)+1 by length(y)+1.

Imagine a beautiful, aesthetic sequence of arrays showing dynamic programming performing an elegant dance over the strings x and y. This will be filled in later.

The overall score for the alignment is calculated as follows:

$\begin{eqnarray*}Score(x,y,DM) & = & DM[Sim, AToInt(T), AToInt(T)] +\\ & & DM[... ... AToInt(M)] \\ & = & 2.5168 + -1.0187 + 2.4804 \\ & = & 3.9785 \end{eqnarray*}$

The DynProg routine judiciously decided to cut the alignment at the third position because the cost of matching amino acid Y against amino acid R is -1.8160 and matching H against G is -1.4169. Inclusion of either or both matches decrease the overall score.

    x:    T     C      I      Y      G    H
    y:    T     W      M      R      H
        2.52  -1.02  2.48  -1.82  -1.42

The remaining alternative is to insert a gap somewhere in the sequence. However, any such gap will incur a penalty of at least DM[FixedDel]=-19.8137. It is easy to verify that all of the following alignments score poorer than the alignment which keeps only the first three bases.

Alternative #1:
    x:    T     C      I      Y      G      H
    y:    T     W      M             R      H
        2.52  -1.02  2.48         -1.00   6.05    -19.8 (gap)

Alternative #2:
    x:    T     C      I      Y      G      H
    y:    T     W      M                    R      
        2.52  -1.02  2.48                 0.62    -21.2100 (gap)

Alternative #3:
    x:    T     C      I      Y      G      H
    y:    T     W      M      R             H
        2.52  -1.02  2.48  -1.82          6.05    -19.8 (gap)

Alternative #4:
    x:    T     C      I             Y     
    y:    T     W      M      R      H
        2.52  -1.02  2.48  -1.82   2.1697         -19.8 (gap)

Next: The BackDynProg Function Up: Dynamic Programming Previous: Dynamic Programming

Gaston Gonnet
1998-09-15