The GlobalAlign Function

Next: The LocalAlign Function Up: Matching Routines Previous: Matching Routines

The `GlobalAlign` Function

The first Darwin routine for creating an alignment between sequences is the GlobalAlign routine.

Calling Sequences:
GlobalAlign(m, DM)
Parameters:

m	:	`Match`
DM	:	`DayMatrix`

Returns: Match

Synposis: Via dynamic programming and the Dayhoff matrix DM, GlobalAlign creates the alignment between sequences m₁ and m₂ contained in the Match structure m.

The similarity scoring in the dynammic programming is done relative to DM, therefore, the PAM distance of the alignment is the PAM distance of DM.

If the Length1 and Length2 fields of m structure have not been set (ie. they equal 0), GlobalAlign finds the lengths which maximize the score. Note this score is not necessarily the global best score but the maximimum score found by proceeding left to right through m₁ and m₂ and cutting the right tails (if necessary). If the lengths are defined, GlobalAlign finds the alignment which maximizes the similarity score and forces the overall length of the alignment to be max(length(m₁), length(m₂)).

The GlobalAlign function is used when

1.: we have a Match with no lengths,
2.: we have a Match without a score,
3.: we have a Match with a similarity score for a similarity matrix that we would like to redo with a different matrix.

> DB:=ReadDb('Sample/SH2'):
> CreateDayMatrices():                       # calculate matrix DM
> m1 := Match(op(Sequence(Entry(1))), op(Sequence(Entry(2))));
> m1 := GlobalAlign(m1, DM);                 # DM is PAM 250
m1 := Match(1127.6,367,1338,492,618,250)
> print(m1);

\footnotesize
lengths=492,618 simil=1127.6, PAM_dist=250, offsets=367,1338,
  identity=41.3%, similarity=23.5%
ID=ABL1_CAEEL   AC=P03949;   DE=TYROSINE-PROTEIN KINASE 
ABL-1 (EC 2.7.1.112) (FRAGMENT).   OS=CAENORHABDITIS ELEGANS.   
ID=ABL2_HUMAN   AC=P42684;   DE=TYROSINE-PROTEIN KINASE ABL2 
(EC 2.7.1.112) (TYROSINE KINASE ARG).   OS=HOMO SAPIENS (HUMAN).   
NNE_____________________________________________________________________________
.::                                                                             
MGQQVGRVGEAPGLQQPQPRGIRGSSAARPSGRRRDPAGRTTETGFNIFTQHDHFASCVEDGFEGDKTGGSSPEALHRPY

______________WCEAR_____________________________________LYSTRKNDASNQRRLGEIGWVPSN
              |::.:                                     | .::::! |:.|..:..||||||
GCDVEPQALNEAIRWSSKENLLGATESDPNLFVALYDFVASGDNTLSITKGEKLRVLGYNQNGEWSEVRSKNGQGWVPSN

FIAPYNSLDKYTWYHGKISRSDSEAILGSGITGSFLVRESETSIGQYTISVRHDGRVFHYRINVDNTEKMFITQEVKFRT
!|:|.|||!|::||||.!|||.:|.!|:| |:|||||||||:| ||.:||:|:!|||!|||||:....|:!!|.|.!|.|
YITPVNSLEKHSWYHGPVSRSAAEYLLSSLINGSFLVRESESSPGQLSISLRYEGRVYHYRINTTADGKVYVTAESRFST

LGELVHHHSVHADGLICLLMYPASKKDKGRGLFSLSPNAPDEWELDRSEIIMHNKLGGGQYGDVYEGYWKRHDCTIAVKA
|:|||||||:.||||!..|.|||:| :|.. :!::|| ..|:||!!|:!|.|::||||||||!||.|.||!::.|!|||:
LAELVHHHSTVADGLVTTLHYPAPKCNKPT_VYGVSP_IHDKWEMERTDITMKHKLGGGQYGEVYVGVWKKYSLTVAVKT

LKEDAMPLHEFLAEAAIMKDLHHKNLVRLLGVCTHEAPFYIITEFMCNGNLLEYLRRTDKSLLPPIILVQMASQIASGMS
||||:|.::|||.|||!||!!:|.|||:||||||.|:||||!||!| .||||!|||:.:!: :::!!|:.||:||:|:|:
LKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTLEPPFYIVTEYMPYGNLLDYLRECNREEVTAVVLLYMATQISSAME

YLEARHFIHRDLAARNCLVSEHNIVKIADFGLARFMKEDTYTAHAGAKFPIKWTAPEGLAFNTFSSKSDVWAFGVLLWEI
|||.!:|||||||||||||:|::!||!|||||:|:|:.|||||||||||||||||||:||!||||.||||||||||||||
YLEKKNFIHRDLAARNCLVGENHVVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESLAYNTFSIKSDVWAFGVLLWEI

ATYGMAPYPGVELSNVYGLLENGFRMDGPQGCPPSVYRLMLQCWNWSPSDRPRFRDIHFNLENLISSNSLNDEVQKQLKK
|||||:||||!!||:||:|||:|!||!.|:||||:||:||..||:|||:|||.|.!.| .:|:!:.::|!:!||.::|.!
ATYGMSPYPGIDLSQVYDLLEKGYRMEQPEGCPPKVYELMRACWKWSPADRPSFAETHQAFETMFHDSSISEEVAEELGR

NNDKKLESDKRRSNVRERSDSKSRHSSHHDRDRDRESLHSRNSNPEIPNRSFIRTDDSVS
..:::........ .. .|::!:.:::.::!!: :.:.::.:::::....:|||..::.|
AASSSSVVPYLPRLPILPSKTRTLKKQVENKENIEGAQDATENSASSLAPGFIRGAQASS

The similarity score for this alignment is extremely high at 1127.6. This implies that the probability of both sequences coming from a common ancestor, as opposed to being a random alignment, is 10^112.76 times more likely (Section on page ).^19.2

The presentation of the alignment information is self explanatory. The line between the sequences is intended to give users a quick graphical indication of the quality of the alignment. Each character in this middle line corresponds to the quality of the match.

`\|`	exact match
`!`	very good match
`:`	good match
`.`	poor match
	very poor match

An intuitive rule is that the closer the character is to the vertical bar, the better the match is.

Next: The LocalAlign Function Up: Matching Routines Previous: Matching Routines

Gaston Gonnet
1998-09-15