Next: Reading Raw Data Files Up: Input/Output Previous: The save Command

Raw Data Files

You have collected a large body of data through experimentation and you would now like to analyze this data with some particular software tool. Unfortunately, you must first place the data into a format that the tool can understand. Your choices boil down to either writing a script in a programming language to make the repetitive transformation or you must spend the next month of your life developing carpal tunnel syndrome accompanied by lower back problems painstakingly creating small mistakes while doing it all by hand. Incompatible formats are certainly not a new problem in computer science yet it can still be a laborious enterprise to normalize data.

This situation commonly occurs when dealing with genetic databases; the different databases (SwissProt, EMBL, GenBank, etc.) have their own unique format and tagging conventions. Before they are usable in Darwin, they most be transformed to correspond to Darwin's protocols. This section introduces several commands which are designed to read raw data into the Darwin system where it can later be reformatted into the appropriate structure.

The ReadRawFile command loads the entire contents of a file into a Darwin name at once. Once in the array, users no longer have to worry about performing file operations but instead exploit the flexibility the name type has to offer.

The readlines command allows users to load a file line by line. This command is particularly useful when transforming extremely large databases (most notably nucleic databases) which can not fit into your systems memory all at once.

In the opposite direction, one needs to have the capability of sending data produced during a Darwin session to an external file. This is needed so users can produce reports about their findings and to redirect long computations to more permanent files for later examination.

Next: Reading Raw Data Files Up: Input/Output Previous: The save Command

Gaston Gonnet
1998-09-15