The entire database Sample/SH2 is stored in Darwin as a string of length DB[TotEntries]. Figure
gives a graphical view of how information is organized internally.
DB[string] points to the beginning of this name.
Recall that each entry from a sequence database in Darwin is wrapped
in the SGML tags <E>, </E>. To extract the entire
contents of an entry, we use the Entry structured type.
> ReadDb('Sample/SH2');
> first := Entry(1);
first := Entry(1)
> second := Entry(2);
second := Entry(2)
> last_three := Entry(76, 77, 78);
last_three := Entry(76,77,78)
> print(first);
<E><ID>ABL1_CAEEL</ID><AC>P03949;</AC><DE>TYROSINE-PROTEIN
KINASE ABL-1 (EC 2.7.1.112) (FRAGMENT).</DE><OS>CAENORHABD
ITIS ELEGANS.</OS><OC>EUKARYOTA; METAZOA; ACOELOMATES; NEM
ATODA; SECERNENTEA; RHABDITIDA.</OC><KW>TRANSFERASE; TYROS
INE-PROTEIN KINASE; SH2 DOMAIN; SH3 DOMAIN.</KW><FT>ACT_SI
TE 283 283</FT><SEQ>NNEWCEARLYSTRKNDASNQRRLGEIGWVPSNFIAPYN
SLDKYTWYHGKISRSDSEAILGSGITGSFLVRESETSIGQYTISVRHDGRVFHYRINV
DNTEKMFITQEVKFRTLGELVHHHSVHADGLICLLMYPASKKDKGRGLFSLSPNAPDE
WELDRSEIIMHNKLGGGQYGDVYEGYWKRHDCTIAVKALKEDAMPLHEFLAEAAIMKD
LHHKNLVRLLGVCTHEAPFYIITEFMCNGNLLEYLRRTDKSLLPPIILVQMASQIASG
MSYLEARHFIHRDLAARNCLVSEHNIVKIADFGLARFMKEDTYTAHAGAKFPIKWTAP
EGLAFNTFSSKSDVWAFGVLLWEIATYGMAPYPGVELSNVYGLLENGFRMDGPQGCPP
SVYRLMLQCWNWSPSDRPRFRDIHFNLENLISSNSLNDEVQKQLKKNNDKKLESDKRR
SNVRERSDSKSRHSSHHDRDRDRESLHSRNSNPEIPNRSFIRTDDSVSFFNPSTTSKV
TSFRAQGPPFPPPPQQNTKPKLLKSVLNSNARHASEEFERNEQDDVVPLAEKNVR</S
EQ></E>
> print(second);
<E><ID>ABL2_HUMAN</ID><AC>P42684;</AC><DE>TYROSINE-PROTEIN
KINASE ABL2 (EC 2.7.1.112) (TYROSINE KINASE ARG).</DE><OS
>HOMO SAPIENS (HUMAN).</OS><OC>EUKARYOTA; METAZOA; CHORDAT
A; VERTEBRATA; TETRAPODA; MAMMALIA; EUTHERIA; PRIMATES.</O
C><KW>TRANSFERASE; TYROSINE-PROTEIN KINASE; PROTO-ONCOGENE
; ATP-BINDING; PHOSPHORYLATION; SH2 DOMAIN; SH3 DOMAIN; AL
TERNATIVE SPLICING.</KW><FT>ACT_SITE 409 409</FT><SEQ>MGQQ
VGRVGEAPGLQQPQPRGIRGSSAARPSGRRRDPAGRTTETGFNIFTQHDHFASCVEDG
FEGDKTGGSSPEALHRPYGCDVEPQALNEAIRWSSKENLLGATESDPNLFVALYDFVA
SGDNTLSITKGEKLRVLGYNQNGEWSEVRSKNGQGWVPSNYITPVNSLEKHSWYHGPV
SRSAAEYLLSSLINGSFLVRESESSPGQLSISLRYEGRVYHYRINTTADGKVYVTAES
RFSTLAELVHHHSTVADGLVTTLHYPAPKCNKPTVYGVSPIHDKWEMERTDITMKHKL
GGGQYGEVYVGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCT
LEPPFYIVTEYMPYGNLLDYLRECNREEVTAVVLLYMATQISSAMEYLEKKNFIHRDL
AARNCLVGENHVVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNTFSIKSDV
WAFGVLLWEIATYGMSPYPGIDLSQVYDLLEKGYRMEQPEGCPPKVYELMRACWKWSP
ADRPSFAETHQAFETMFHDSSISEEVAEELGRAASSSSVVPYLPRLPILPSKTRTLKK
QVENKENIEGAQDATENSASSLAPGFIRGAQASSGSPALPRKQRDKSPSSLLEDAKET
CFTRDRKGGFFSSFMKKRNAPTPPKRSSSFREMENQPHKKYELTGNFSSVASLQHADG
FSFTPAQQEANLVPPKCYGGSFAQRNLCNDDGGGGGGSGTAGGGWSGITGFFTPRLIK
KTLGLRAGKPTASDDTSKPFPRSNSTSSMSSGLPEQDRMAMTLPRNCQRSKLQLERTV
STSSQPEENVDRANDMLPKKSEESAAPSRERPKAKLLPRGATALPLRTPSGDLAITEK
DPPGVGVAGVAAAPKGKEKNGGARLGMAGVPEDGEQPGWPSPAKAAPVLPTTHNHKVP
VLISPTLKHTPADVQLIGTDSQGNKFKLLSEHQVTSSGDKDRPRRVKPKCAPPPPPVM
RLLQHPSICSDPTEEPTALTAGQSTSETQEGGKKAALGAVPISGKAGRPVMPPPQVPL
PTSSISPAKMANGTAGTKVALRKTKQAAEKISADKISKEALLECADLLSSALTEPVPN
SQLVDTGHQLLDYCSGYVDCIPQTRNKFAFREAVSKLELSLQELQVSSAAAGVPGTNP
VLNNLLSCVQEISDVVQR</SEQ></E>
> print(last_three);
E><ID>YRK_CHICK</ID><AC>Q02977;</AC><DE>PROTO-ONCOGENE TYR
OSINE-PROTEIN KINASE YRK (EC 2.7.1.112) (P60-YRK) (YES REL
ATED KINASE).</DE><OS>GALLUS GALLUS (CHICKEN).</OS><OC>EUK
...
E><ID>ZA70_HUMAN</ID><AC>P43403;</AC><DE>TYROSINE-PROTEIN
KINASE ZAP-70 (EC 2.7.1.112) (70 KD ZETA-ASSOCIATED PROTEI
N).</DE><OS>HOMO SAPIENS (HUMAN).</OS><OC>EUKARYOTA; METAZ
...
<E><ID>ZA70_MOUSE</ID><AC>P43404;</AC><DE>TYROSINE-PROTEIN
KINASE ZAP-70 (EC 2.7.1.112) (70 KD ZETA-ASSOCIATED PROTE
IN).</DE><OS>MUS MUSCULUS (MOUSE).</OS><OC>EUKARYOTA; META
...
We can isolate the contents of a specific SGML tag by including the
tag in single quotes and square brackets.
> first['ID']; # get the identification tag of the 1st entry
ABL1_CAEEL
> first['SEQ']; # get the sequence for the 1st entry
NNEWCEARLYSTRKNDASNQRRLGEIGWVPSNFIAPYNSLDKYTWYHGKI
..(557).. DVVPLAEKNVR
> second['FT'];
ACT_SITE 409 409
> last_three['DE']; # get the description tag
#for the last three entries
[PROTO-ONCOGENE TYROSINE-PROTEIN KINASE YRK (EC 2.7.1.112) (P60-YRK) (YES REL\
ATED KINASE).,
TYROSINE-PROTEIN KINASE ZAP-70 (EC 2.7.1.112) (70 KD ZETA-ASSOCIATED PROTEIN).,
TYROSINE-PROTEIN KINASE ZAP-70 (EC 2.7.1.112) (70 KD ZETA-ASSOCIATED PROTEIN).]
Notice that when an Entry structure has only a single posint parameter, as is the case with first and second
above, and we select for a specific tag, then it returns the
contents contained in this field as a name object.
When more than one entry is specified, as is the case with last_three, it returns a list of string objects. The
ith element of this list corresponds to the ith parameter of
Entry.8.1
![]() |