next up previous contents
Next: Defining New Structured Types: Up: An Introduction to Darwin Previous: Searching

Structured Types

Section [*] of Chapter [*] introduced a number of Darwin built-in data types. A complete list of built-in types can be found in Table [*]. We have also seen how elements from different types can be included in array and set data structures. The array classics from the previous section was an example of such a heterogeneous data structure.

> ReadProgram('Sample/arrays'):
> print(classics);
 [Joseph Felsenstein, 
Phylogenies from molecular sequences:  inference and reliability, 
Annual Revue of Genetics, 1988, 22, 521--565]
 [Smith, Temple F. and Waterman, Michael S., 
Identification of common molecular subsequences, J. Mol. Biol., 1981, 147, 
 [Dayhoff, Margaret O. and Schwartz, R. M. and Orcutt, B. C., 
A model for evolutionary change in proteins, 
Atlas of Protein Sequence and Structure, 1978, 5, 345--352]
 [Needleman, S. B. and Wunsch, C. D., A general method applicable to the search \
for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 1970
, 48, 443--453]
However, as the diversity of data in our programs become large, types such as list and set become increasingly insufficient for maintaining clean, readable programs. Even for the relatively small example of classics, the onus lies on the programmer's shoulders to remember which field corresponds to which item of information. Most programming languages, including Darwin, offer some manner for creating structured data types to eleviate these problems.

Table: A complete list of built-in Darwin types.
Table: A complete list of built-in Darwin types.
Type Name Description Example
constant Any number or variable assigned 1, 33, 203, 39293
  a number.  
boolean boolean true, false
real Real numbers. $1, 1.1, 1.01, 1.001, \ldots$
integer Integers. $\ldots -2, -1, 0, 1, 2 \ldots$
posint Integers greater than zero $1, 2, 3, \ldots$
string Any sequence of symbols 'hello'
  surrounded by single quotes  
anything Any built-in type posint, string
uneval An unevaluated procedure call. undefined()
procedure A Darwin routine mod, sin, exp
equation A sequence of the form y=5, x=z
range $x\ldots y$ where x,y are of type real 1..100
list an ordered multiset (surrounded [a,b, c]
array by [, ] symbols)  
set an unordered set (surrounded {1, 2, 3}
  by {, } or (* *) symbols)  
matrix a two-dimensional square array [[1, 2], [3, 4]]
database A Darwin sequence database DB
grid A Darwin grid file g where
structure A built-in Darwin structured typed Tree, Match, Gene
symbol A sequence of symbols. x, xyz, hello_there
name A legal Darwin name. abc, a1, x

A structured data type in Darwin is built using functions. The best way to learn how to define a structured type in Darwin is through examples. Let us begin by defining a new type nonnegint (non-negative integer) which consists of posint (positive integers) and the value 0.

> nonnegint := proc( x : {0, posint})
>   return(noeval(nonnegint(x)));
> end:
This function accepts any argument which is either of type posint or has value 0. The type checking is done within the formal parameter list. The body of the function simply returns an unevaluated copy of the the function nonnegint with the parameter x. The command noeval (no evaluation) delays the evaluation of its argument. Darwin simply returns the parameter of noeval as an object of type name. We can define variables of type nonnegint by calling the function with the appropriate argument. Observe what happens when the argument is not of the correct type.
> a := nonnegint(0);
a := nonnegint(0)
> b := nonnegint(5);
b := nonnegint(5)
> c := nonnegint(1388293823);
c := nonnegint(1388293823)
> d := nonnegint(-1);
nonnegint expects a 1st argument, x:{0,posint}, found: -1
Error, invalid arguments
A variable (an instance) of a structured type is simply an unevaluated procedure call in Darwin. There are two ways a procedure call can remain unevaluated. The first is through the use of the noeval command described above. When we wrap the procedure call in a noeval command, Darwin just returns the procedure call and contents as is.
> delayed := noeval(factorial(5));
delayed := factorial(5)
> frustrated := noeval(print('I so desparately want to simplify'));
frustrated := print(I so desparately want to simplify)

The second way a procedure call will remain unevaluated is when the procedure name is undefined in the scope. 5.1

> any := thing_goes(1, 2, 3);
any := thing_goes(1,2,3)
> I := have_unlimited('freedom');
I := have_unlimited(freedom)
> to_do := as_I_like(abc, 123, ['a', 'list'] );
to_do := as_I_like(abc,123,[a, list])
There are no procedures named thing_goes, have_unlimited and as_I_like defined so Darwin simply assigns the unevaluated name and parameters to the variables any, I and to_do. What type of data do these structured types allow? The answer is, literally, anything.

The variables which we create in this manner are assigned the type uneval (the values assigned to them look like unevaluated procedure calls).

> type(any, uneval);
> type(I, uneval);
> type(to_do, uneval);

The manner in which one goes about definining structured types may seem a little abstract and, if you are an experienced programmer, much different than what you are used to with languages such as C or PASCAL. But bear with us: this method of structured type creation turns out to be extremely versatile. As in our example nonnegint, we can place an unlimited amount of code in the procedure which defines a type. This allows us to perform an arbitrary amount of type checking and manipulation of data.

Darwin comes equipped with several built-in structured types. Table [*] contains a sample of the most used structures. Chapter [*] - Iteration and Recursion shows an application using the Tree structure. The remaining entries are explored in greater depth in Part [*] - Darwin and Problems from Biochemistry.

Table: A list of built-in Darwin structured types.
Table: A list of built-in Darwin structured types.
Type Name Description
Match Used for pairwise alignments.
NucPepMatch Used for mathcing DNA with AA sequences
Gene Used to hold a gene and annotation information
Tree Used to hold a tree structure
Stat Used in the collection of statistical
PatEntries Used in the construction of patricia
Graph Used to hold combinatorial graphs.
Domain Used to hold a domain structure
MultiAlign Used to hold sequence for a multiple

next up previous contents
Next: Defining New Structured Types: Up: An Introduction to Darwin Previous: Searching
Gaston Gonnet