next up previous contents
Next: System Commands Up: An Introduction to Darwin Previous: Smothing Data Points

   
Polymorphism

The preceding chapters have already introduced a number of procedures which act differently depending on the number and type of the parameters passed to it. As another concrete example, consider the Darwin procedure print.

> print(5.78);                 # argument is of type real
5.7800
> print('a string');           # argument is of type string.
a string
> print([1, 2, 3, 4]);         # argument is a list
[1, 2, 3, 4]
> print([[1, 2], [3, 4]]);     # argument is a square matrix
 1 2
 3 4
Notice that Darwin differentiates between the list [1, 2, 3, 4] and the square matrix [[1, 2], [3, 4]] and chooses a nicer format for the latter. This behavior indicates that print must examine the type of the argument and we could imagine a procedure deep inside the machinery of Darwin that looks something like this
 print := proc ()
   
   for i from 1 to length(args) do
     if (type( args[i], real )) then
       call the routine for printing out real values
     elif (type( args[i], string )) then
       call the routine for printing out string strings
     elif (type( args[i], list )) then
       if (type( args[i], matrix )) then
         call routine to print out a square matrix
       else
         call routine to print out a list
       fi;
     elif
      .
      .     (do this for every different type in Darwin)
      .
 end:
This is all well and fine for the built-in data types but how should Darwin determine how to treat new user defined data structures? The answer is to allow the user to write their own specially tailored version of print for their data type. We shall write a simple print function for our example ProEntry from Chapter [*] on page [*].
> ProEntry_print := proc(  ) 
>   protent:=args;
>   lprint('\nName: ', protent[1], '  Organism: ', protent[4]);
>   lprint('DB: ', protent[2], '  Accession Number: ', protent[3]);
>   lprint('Sequence length: ', protent[6], '\n     ');
>   for i from 1 to protent[6] do
>     if (mod(i,50)=0) then
>       printf('\n     ');           
>     fi;                            
>     printf('%c',protent[5][i]);      
>   od;
>   printf('\n');
> end;
The procedure name ProEntry_print was not an arbitrary choice. The general format DataType_print indicates to Darwin that this is a new form of the general procedure print which is to be called when the type of the argument to print is DataType.
> celegans := ProEntry('ABL1_CAEEL', 'SwissProt', 'P03949', 'C. ELEGANS', 
>                'NNEWCEARLYSTRKNDASNQRRLGEIGWVPSNFIAPYNSLDK', 42):
> print(celegans);     # structure celegans has type ProEntry

Name:  ABL1_CAEEL   Organism:  C. ELEGANS
DB:  SwissProt   Accession Number:  P03949
Sequence length:  42     
 
NNEWCEARLYSTRKNDASNQRRLGEIGWVPSNFIAPYNSLDK
Whereas most names refer to exactly one procedure or function, the name print has now been assigned to a set of procedures. We say that print is an overloaded name. Overloading is a form of what is called polymorphism.10.1

In Darwin there are a handful of built-in overloaded names. These include print, HTML (§ [*]) and select (§ [*]). In general, we can overload any name by using the Darwin statement option polymorphic. The example below sketches how to create a polymorphic function frequency which calculates the frequency of each base in a sequence. We will have two versions: one for DNA, and one for amino acid sequences. The first step towards creating a polymorphic function frequency is to tell Darwin that the name frequency is polymorphic. This is accomplished by creating a procedure which is empty save for the command option polymorphic.

> frequency := proc( )              # an empty procedure which states
>   option polymorphic;             #   that frequency is polymorphic
> end;
There is no change in how we define our structured types nor in how we allocate structures for these types.
 DNA := proc( )
   description 'A data structure to hold a DNA sequence';
   if ( nargs=0 ) then
     return(copy(noeval(DNA(''))));
   elif ( nargs=1 ) then
     return(copy(noeval(DNA(args))));
   else
     print( DNA );
     error(' Incorrect format in structure DNA ', args);
   fi;
 end:

 AA := proc( )
   description 'A data structure to hold an AA sequence';
   if ( nargs=0 ) then
     return(copy(noeval(AA(''))));
   elif ( nargs=1 ) then
     return(copy(noeval(AA(args))));
   else
     print( AA );
     error(' Incorrect format in structure AA ', args);
   fi;
 end:

> x := DNA('ACCGACGGACTACCGAGAGTCCCA');
> y := AA('QHFPSTHEQCDNRAAATWPWYV');
The final step involves creating the appropriate versions of frequency. In this case, we require two such functions: DNA_frequency for the DNA type and AA_frequency for the AA type. We use two built-in Darwin functions NToInt and AToInt. These functions convert a DNA base into an integer between one and four and convert an amino acid into an integer between one and twenty respectively.
> DNA_frequency := proc( seq )
>   total := CreateArray(1..4, 0);     
>   for i from 1 to length(seq) do
>     total[NToInt(seq[i])] := total[NToInt(seq[i])] + 1;    
>   od;                            # NToInt(x) returns an integer between 1 and 4
>                                  # 'A'=1, 'C'=2, 'G'=3, 'T'=4
>   for i from 1 to length(total) do
>     total[i] := total[i]/length(seq);
>   od;
>   total;
> end:
>
> AA_frequency := proc( seq  )
>   # initialize a 20 element vector to zeros
>   total := CreateArray(1..20, 0);            
>   for i from 1 to length(seq) do
>     total[AToInt(seq[i])] := total[AToInt(seq[i])] + 1;    
>   od;                     
>             # AToInt(x) returns an integer between 1 and 20
>                                          
>   for i from 1 to length(total) do
>     total[i] := total[i]/length(seq);
>   od;
>   total;   
> end:
To calculate the frequency for a sequence, we need only call the function frequency with the sequence. Darwin determines the type of the sequence and calls the appropriate version of the function.
> frequency(x);
[0.2917, 0.3750, 0.2500, 0.08333333]
> frequency(y);
[0.1364, 0.04545455, 0.04545455, 0.04545455, 0.04545455,  0.09090909, 
0.04545455, 0, 0.09090909, 0, 0, 0, 0, 0.04545455, 0.09090909, 
0.04545455, 0.09090909, 0.09090909, 0.04545455, 0.04545455]
Readers familiar with languages such as C++, Java, and SmallTalk may see traces of object oriented programming. While strictly speaking, Darwin is not an object oriented programming language, the ability to overload on names, to pass a variable number of arguments to a routine and to make these routines polymorphic allow users to build programs in a similar style as to what you would expect to see in a purely object oriented language. The biggest disparity between Darwin and these languages is caused by the scoping rules which make it inconvenient to stay strictly within these paradigms. Nevertheless, the authors feel that the inclusion of these structures in Darwin allow one to create robust, easy to understand and error free programs. The file Sample/Entry contains a complete version of our structured type ProEntry that we hope may act as a prototype for new users.10.2


next up previous contents
Next: System Commands Up: An Introduction to Darwin Previous: Smothing Data Points
Gaston Gonnet
1998-09-15