The first Darwin routine for creating an alignment between sequences is the GlobalAlign routine.
m | : | Match |
DM | : | DayMatrix |
Returns: Match
Synposis: Via dynamic programming and the Dayhoff matrix DM, GlobalAlign creates the alignment between sequences m1 and m2 contained in the Match structure m.
The similarity scoring in the dynammic programming is done relative to DM, therefore, the PAM distance of the alignment is the PAM distance of DM.
If the Length1 and Length2 fields
of m structure have not been set (ie. they equal 0), GlobalAlign
finds the lengths which maximize the score. Note this score is not
necessarily the global best score but the maximimum score
found by proceeding left to right through m1 and
m2 and cutting the right tails (if necessary). If the lengths are
defined, GlobalAlign finds the alignment which maximizes the
similarity score and forces the overall length of the alignment to be
max(length(m1), length(m2)).
The GlobalAlign function is used when
> DB:=ReadDb('Sample/SH2'): > CreateDayMatrices(): # calculate matrix DM > m1 := Match(op(Sequence(Entry(1))), op(Sequence(Entry(2)))); > m1 := GlobalAlign(m1, DM); # DM is PAM 250 m1 := Match(1127.6,367,1338,492,618,250) > print(m1);
\footnotesize lengths=492,618 simil=1127.6, PAM_dist=250, offsets=367,1338, identity=41.3%, similarity=23.5% ID=ABL1_CAEEL AC=P03949; DE=TYROSINE-PROTEIN KINASE ABL-1 (EC 2.7.1.112) (FRAGMENT). OS=CAENORHABDITIS ELEGANS. ID=ABL2_HUMAN AC=P42684; DE=TYROSINE-PROTEIN KINASE ABL2 (EC 2.7.1.112) (TYROSINE KINASE ARG). OS=HOMO SAPIENS (HUMAN). NNE_____________________________________________________________________________ .:: MGQQVGRVGEAPGLQQPQPRGIRGSSAARPSGRRRDPAGRTTETGFNIFTQHDHFASCVEDGFEGDKTGGSSPEALHRPY ______________WCEAR_____________________________________LYSTRKNDASNQRRLGEIGWVPSN |::.: | .::::! |:.|..:..|||||| GCDVEPQALNEAIRWSSKENLLGATESDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNQNGEWSEVRSKNGQGWVPSN FIAPYNSLDKYTWYHGKISRSDSEAILGSGITGSFLVRESETSIGQYTISVRHDGRVFHYRINVDNTEKMFITQEVKFRT !|:|.|||!|::||||.!|||.:|.!|:| |:|||||||||:| ||.:||:|:!|||!|||||:....|:!!|.|.!|.| YITPVNSLEKHSWYHGPVSRSAAEYLLSSLINGSFLVRESESSPGQLSISLRYEGRVYHYRINTTADGKVYVTAESRFST LGELVHHHSVHADGLICLLMYPASKKDKGRGLFSLSPNAPDEWELDRSEIIMHNKLGGGQYGDVYEGYWKRHDCTIAVKA |:|||||||:.||||!..|.|||:| :|.. :!::|| ..|:||!!|:!|.|::||||||||!||.|.||!::.|!|||: LAELVHHHSTVADGLVTTLHYPAPKCNKPT_VYGVSP_IHDKWEMERTDITMKHKLGGGQYGEVYVGVWKKYSLTVAVKT LKEDAMPLHEFLAEAAIMKDLHHKNLVRLLGVCTHEAPFYIITEFMCNGNLLEYLRRTDKSLLPPIILVQMASQIASGMS ||||:|.::|||.|||!||!!:|.|||:||||||.|:||||!||!| .||||!|||:.:!: :::!!|:.||:||:|:|: LKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTLEPPFYIVTEYMPYGNLLDYLRECNREEVTAVVLLYMATQISSAME YLEARHFIHRDLAARNCLVSEHNIVKIADFGLARFMKEDTYTAHAGAKFPIKWTAPEGLAFNTFSSKSDVWAFGVLLWEI |||.!:|||||||||||||:|::!||!|||||:|:|:.|||||||||||||||||||:||!||||.|||||||||||||| YLEKKNFIHRDLAARNCLVGENHVVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNTFSIKSDVWAFGVLLWEI ATYGMAPYPGVELSNVYGLLENGFRMDGPQGCPPSVYRLMLQCWNWSPSDRPRFRDIHFNLENLISSNSLNDEVQKQLKK |||||:||||!!||:||:|||:|!||!.|:||||:||:||..||:|||:|||.|.!.| .:|:!:.::|!:!||.::|.! ATYGMSPYPGIDLSQVYDLLEKGYRMEQPEGCPPKVYELMRACWKWSPADRPSFAETHQAFETMFHDSSISEEVAEELGR NNDKKLESDKRRSNVRERSDSKSRHSSHHDRDRDRESLHSRNSNPEIPNRSFIRTDDSVS ..:::........ .. .|::!:.:::.::!!: :.:.::.:::::....:|||..::.| AASSSSVVPYLPRLPILPSKTRTLKKQVENKENIEGAQDATENSASSLAPGFIRGAQASS
The similarity score for this alignment is extremely high at
1127.6.
This implies that the probability of both sequences coming
from a common ancestor, as opposed to being a random alignment,
is
10112.76 times more likely (Section
on page
).19.2
The presentation of the alignment information is self explanatory. The line between the sequences is intended to give users a quick graphical indication of the quality of the alignment. Each character in this middle line corresponds to the quality of the match.
| |
exact match |
! |
very good match |
: |
good match |
. |
poor match |
very poor match |