The preceding chapters have already introduced a number of procedures which act differently depending on the number and type of the parameters passed to it. As another concrete example, consider the Darwin procedure print.
> print(5.78); # argument is of type real 5.7800 > print('a string'); # argument is of type string. a string > print([1, 2, 3, 4]); # argument is a list [1, 2, 3, 4] > print([[1, 2], [3, 4]]); # argument is a square matrix 1 2 3 4Notice that Darwin differentiates between the list [1, 2, 3, 4] and the square matrix [[1, 2], [3, 4]] and chooses a nicer format for the latter. This behavior indicates that print must examine the type of the argument and we could imagine a procedure deep inside the machinery of Darwin that looks something like this
print := proc () for i from 1 to length(args) do if (type( args[i], real )) then call the routine for printing out real values elif (type( args[i], string )) then call the routine for printing out string strings elif (type( args[i], list )) then if (type( args[i], matrix )) then call routine to print out a square matrix else call routine to print out a list fi; elif . . (do this for every different type in Darwin) . end:This is all well and fine for the built-in data types but how should Darwin determine how to treat new user defined data structures? The answer is to allow the user to write their own specially tailored version of print for their data type. We shall write a simple print function for our example ProEntry from Chapter
> ProEntry_print := proc( ) > protent:=args; > lprint('\nName: ', protent[1], ' Organism: ', protent[4]); > lprint('DB: ', protent[2], ' Accession Number: ', protent[3]); > lprint('Sequence length: ', protent[6], '\n '); > for i from 1 to protent[6] do > if (mod(i,50)=0) then > printf('\n '); > fi; > printf('%c',protent[5][i]); > od; > printf('\n'); > end;The procedure name ProEntry_print was not an arbitrary choice. The general format DataType_print indicates to Darwin that this is a new form of the general procedure print which is to be called when the type of the argument to print is DataType.
> celegans := ProEntry('ABL1_CAEEL', 'SwissProt', 'P03949', 'C. ELEGANS', > 'NNEWCEARLYSTRKNDASNQRRLGEIGWVPSNFIAPYNSLDK', 42): > print(celegans); # structure celegans has type ProEntry Name: ABL1_CAEEL Organism: C. ELEGANS DB: SwissProt Accession Number: P03949 Sequence length: 42 NNEWCEARLYSTRKNDASNQRRLGEIGWVPSNFIAPYNSLDKWhereas most names refer to exactly one procedure or function, the name print has now been assigned to a set of procedures. We say that print is an overloaded name. Overloading is a form of what is called polymorphism.10.1
In Darwin there are a handful of built-in overloaded names. These
include print, HTML (§ ) and
select (§
). In general, we can overload any name
by using the Darwin statement option polymorphic. The example below
sketches how to create a polymorphic function frequency which
calculates the frequency of each base in a sequence. We will have
two versions: one for DNA, and one for amino acid sequences. The first
step towards creating a polymorphic function frequency
is to tell Darwin that the name frequency is polymorphic. This
is accomplished by creating a procedure which is empty save for the
command option polymorphic.
> frequency := proc( ) # an empty procedure which states > option polymorphic; # that frequency is polymorphic > end;There is no change in how we define our structured types nor in how we allocate structures for these types.
DNA := proc( ) description 'A data structure to hold a DNA sequence'; if ( nargs=0 ) then return(copy(noeval(DNA('')))); elif ( nargs=1 ) then return(copy(noeval(DNA(args)))); else print( DNA ); error(' Incorrect format in structure DNA ', args); fi; end: AA := proc( ) description 'A data structure to hold an AA sequence'; if ( nargs=0 ) then return(copy(noeval(AA('')))); elif ( nargs=1 ) then return(copy(noeval(AA(args)))); else print( AA ); error(' Incorrect format in structure AA ', args); fi; end: > x := DNA('ACCGACGGACTACCGAGAGTCCCA'); > y := AA('QHFPSTHEQCDNRAAATWPWYV');The final step involves creating the appropriate versions of frequency. In this case, we require two such functions: DNA_frequency for the DNA type and AA_frequency for the AA type. We use two built-in Darwin functions NToInt and AToInt. These functions convert a DNA base into an integer between one and four and convert an amino acid into an integer between one and twenty respectively.
> DNA_frequency := proc( seq ) > total := CreateArray(1..4, 0); > for i from 1 to length(seq) do > total[NToInt(seq[i])] := total[NToInt(seq[i])] + 1; > od; # NToInt(x) returns an integer between 1 and 4 > # 'A'=1, 'C'=2, 'G'=3, 'T'=4 > for i from 1 to length(total) do > total[i] := total[i]/length(seq); > od; > total; > end: > > AA_frequency := proc( seq ) > # initialize a 20 element vector to zeros > total := CreateArray(1..20, 0); > for i from 1 to length(seq) do > total[AToInt(seq[i])] := total[AToInt(seq[i])] + 1; > od; > # AToInt(x) returns an integer between 1 and 20 > > for i from 1 to length(total) do > total[i] := total[i]/length(seq); > od; > total; > end:To calculate the frequency for a sequence, we need only call the function frequency with the sequence. Darwin determines the type of the sequence and calls the appropriate version of the function.
> frequency(x); [0.2917, 0.3750, 0.2500, 0.08333333] > frequency(y); [0.1364, 0.04545455, 0.04545455, 0.04545455, 0.04545455, 0.09090909, 0.04545455, 0, 0.09090909, 0, 0, 0, 0, 0.04545455, 0.09090909, 0.04545455, 0.09090909, 0.09090909, 0.04545455, 0.04545455]Readers familiar with languages such as C++, Java, and SmallTalk may see traces of object oriented programming. While strictly speaking, Darwin is not an object oriented programming language, the ability to overload on names, to pass a variable number of arguments to a routine and to make these routines polymorphic allow users to build programs in a similar style as to what you would expect to see in a purely object oriented language. The biggest disparity between Darwin and these languages is caused by the scoping rules which make it inconvenient to stay strictly within these paradigms. Nevertheless, the authors feel that the inclusion of these structures in Darwin allow one to create robust, easy to understand and error free programs. The file Sample/Entry contains a complete version of our structured type ProEntry that we hope may act as a prototype for new users.10.2