The SearchDb function can search the entire DB sequence database provided that the entire database is loaded into memory. With extremely large databases this can be a problem; this is particularly true with DNA databases such as EMBL that are on the order of hundreds of megabytes. Even with moderate-sized databases, searches via SearchDb can be rather slow. The following total CPU times were recordered for searching the Swiss-Prot database for patterns via SearchDb.
> rtime(SearchDb('hello')); # a pattern which is not a DNA,RNA or AA sequence. > rtime(SearchDb('aaa')); # a common pattern takes a long time > rtime(SearchDb('SLVHLRIKDRIPANNDIYVLKGDLY')); # a AA sequence > rtime(SearchDb('P30376)); # searching for an accession number > rtime(SearchDb('143Z_SHEEP')); # searching for an identification nameFor commonly searched for strings, such as accession numbers and entry IDs, this sluggishness can be an annoyance.
To circumvent these problems, Darwin offers routines to create grid files. A grid file
DESCRIPTION GOES HERE
We present a short Darwin program to create a grid file indexed by the AC (accession number) and ID (identification) field of the Swiss-Prot database. An extended version of what follows can be found in the Darwin library function CreateSpGrid.
filename | : | name |
Returns:
Builds a grid file indexed by the ID (identification) field and
AC (accession number) field of the database located at system
variable DB. It stores this grid file in external filename.
IndexDB := proc( filename : name) format := SetGridFile(ID=string, AC=string, Start=integer, End=integer); # create a structure of type GridfileFormat gf := CreateGrid(filename, format); for i from 1 to DB[TotEntries] do entry := GFstructure(); holder := op(String(Entry(i))); entry['ID'] := SearchTag('ID', holder); AC := SearchTag('AC', holder); entry['AC'] := ac[1..CaseSearchString(';', AC)]; entry['StartOffset'] := DB[Entry, i]; entry['EndOffset'] := If( i < DB[TotEntries], DB[Entry, i+1]-1, DB[TotChars] ); AddGrid( gf, entry ); od; CompressGrid(gf); CloseGrid(gf); end: