next up previous contents
Next: Nested Types Up: Defining New Structured Types: Previous: Simplification

Normalization

In order to determine equality/inequality between two structures, there must be a normalized form for the data. Is the number 01 equal to 1? Treated as strings, they certainly are not. However, if we consider them to be integers, then they certainly are. Enforcing strict rules about how data should be stored in structures results in higher quality data which minimizes redundancy and chance of errors.

Returning to our protein sequence example, suppose we were collecting protein entries from a large number of sources. Some of these sources refer to the protein database SwissProt [3] as ``SP'' while others write ``sprot'' or possibly ``Swiss''. If we know a priori the alternative values for a field, we can transform all entries into one normalized form.

> if ((args[2]='SP') or (args[2]='sprot') or (args[2]='Swiss')) then
>   DB_name := 'SwissProt';
> fi;
This should be done in the analogous manner for all fields Name, Accession, Organism and so forth of our structured type. The return statements should now return the modified forms of each element of args. For example,
> return(noeval(ProEntry(Name, DB_name, Accession, Organism_name, Seq, Length)));



Gaston Gonnet
1998-09-15