Simple univariate statistics
can be collected using internal Darwin functions.
To collect these statistics we must initialize
the corresponding counters with the structure
> ByRefShake := Stat('similarity score with LocalAlign'): > ByDynProgr := Stat('similarity score with DynProg'):The argument to
Statcan be any name, a printable definition is convenient.
The following statements produce the random test sequences,
match them to the given fragment,
and update the statistical information
function for both a refined match against a typical random
sequence and exactly against a random sequence of
the same length.
> SeqPos := GetOffset(SearchSeq): Error, DB must be assigned a database > for i to 100 doSet
SeqPosto be like a database pointer to the given fragment and run a loop 100 times. The next statements compute a random sequence. The length of this random sequence affects the quality of the result. The longer the sequence, the better the chances of finding a good match. To solve this problem we will select a random entry in the database and use its length as the length of the random sequence. In this way, our random sequences will have the same length distribution as the database.
> RandEntry := round( Rand()*DB[TotEntries] + 0.5 ); > RandLen := length( Strings(Sequence(Entry(RandEntry)))); > RandSeq := CreateRandSeq(RandLen,AF);Now we can compute the refined match and collect its similarity
> m := LocalAlign( Match(SeqPos,GetOffset(RandSeq)), DM); > update(ByRefShake,m[Sim]);and secondly, with a sequence of the same length, we compute an exact match
> n := DynProg(SearchSeq,CreateRandSeq(Len,AF),DM,Len,Len); > update(ByDynProgr,n); > od: Error, cannot apply selector on object
We can display or use the information in two different ways.
First we use the
function which prints all the information.
Internal constants define the intervals to be at the 95%
For the refined-shaked matches, the similarity score is
> print(ByRefShake); Error, (in Stats_print) division by zero
The second method uses selectors on the
VarVariance (the variance of the
MeanVar (the mean and its 95%
confidence interval as a name),
ProbStdDev takes a probability
as an argument and returns the number of standard
deviations away from the mean
that will give this probability.
ProbStdDev can be defined in terms
of the inverse complementary error
> ProbStdDev := p -> sqrt(2)*erfcinv(2*p):The values for 0.025 and 0.001 are well known:
> ProbStdDev(0.025), ProbStdDev(0.001); 1.9600, 3.0902To expect 10 random matches, we should set the probability to be 10/n, where n is the number of entries in the database. The goal should then be
> Z := ProbStdDev( 10 / DB[TotEntries] ); Error, cannot apply selector on objectstandard deviations away from the mean random score.
From this data we can compute an approximation to the goal:
> Goal := ByRefShake[Mean] + Z*sqrt(ByRefShake[Variance]); Error, (in Stats_select) division by zero
The last piece of information needed is the maximum possible similarity score. This can be computed by adding all the corresponding diagonal entries of the Dayhoff matrix for each amino acid in the fragment:
> MaxGoal := 0: > for i to Len do > MaxGoal := MaxGoal + DM[Sim,AToInt(SearchSeq[i]),AToInt(SearchSeq[i])] > od: > MaxGoal; 45.9342Since the goal is relatively far from the maximum possible score, we are assured that the interval is not too narrow.