When two sequences align well except at the ends, it is sometimes desirable to ignore these tails and align the two subsequences (subsets of the original sequences) which align best. The alignment from the previous subsection between the tyrosine protein kinases ABL1_CAEEL (C. Elegans) and ABL2_HUMAN (Humans) is extremely poor at the beginning. Only three bases align before an extremely long gap which is interrupted by a five base alignment before yet another long gap.
The LocalAlign function will perform such a subsequence alignment or local alignment. It is an implementation of the classic Smith-Waterman algorithm [25], a straightforward variant of dynamic programming with some nice properties making it extremely fast.
> DB:=ReadDb('Sample/SH2'); > CreateDayMatrices(); # calculate matrix DM > m1 := Match(op(Sequence(Entry(1))), op(Sequence(Entry(2)))): > Glob_m1 := GlobalAlign(m1, DM); Glob_m1 := Match(1127.6,367,1338,492,618,250) > Loc_m1 := LocalAlign(m1,DM); Loc_m1 := Match(1330.5,378,1477,481,479,250) > print("); lengths=481,479 simil=1330.5, PAM_dist=250, offsets=378,1477, identity=52.8%, similarity=29.3% ID=ABL1_CAEEL AC=P03949; DE=TYROSINE-PROTEIN KINASE ABL-1 (EC 2.7.1.112) (FRAGMENT). OS=CAENORHABDITIS ELEGANS. ID=ABL2_HUMAN AC=P42684; DE=TYROSINE-PROTEIN KINASE ABL2 (EC 2.7.1.112) (TYROSINE KINASE ARG). OS=HOMO SAPIENS (HUMAN). TRKNDASNQRRLGEIGWVPSNFIAPYNSLDKYTWYHGKISRSDSEAILGSGITGSFLVRESETSIGQYTISVRHDGRVFH ::::! |:.|..:..||||||!|:|.|||!|::||||.!|||.:|.!|:| |:|||||||||:| ||.:||:|:!|||!| NQNGEWSEVRSKNGQGWVPSNYITPVNSLEKHSWYHGPVSRSAAEYLLSSLINGSFLVRESESSPGQLSISLRYEGRVYH YRINVDNTEKMFITQEVKFRTLGELVHHHSVHADGLICLLMYPASKKDKGRGLFSLSPNAPDEWELDRSEIIMHNKLGGG ||||:....|:!!|.|.!|.||:|||||||:.||||!..|.|||:| :|.. :!::|| ..|:||!!|:!|.|::||||| YRINTTADGKVYVTAESRFSTLAELVHHHSTVADGLVTTLHYPAPKCNKPT_VYGVSP_IHDKWEMERTDITMKHKLGGG QYGDVYEGYWKRHDCTIAVKALKEDAMPLHEFLAEAAIMKDLHHKNLVRLLGVCTHEAPFYIITEFMCNGNLLEYLRRTD |||!||.|.||!::.|!|||:||||:|.::|||.|||!||!!:|.|||:||||||.|:||||!||!| .||||!|||:.: QYGEVYVGVWKKYSLTVAVKTLKEDTMEVEEFLKEAAVMKEIKHPNLVQLLGVCTLEPPFYIVTEYMPYGNLLDYLRECN KSLLPPIILVQMASQIASGMSYLEARHFIHRDLAARNCLVSEHNIVKIADFGLARFMKEDTYTAHAGAKFPIKWTAPEGL !: :::!!|:.||:||:|:|:|||.!:|||||||||||||:|::!||!|||||:|:|:.|||||||||||||||||||:| REEVTAVVLLYMATQISSAMEYLEKKNFIHRDLAARNCLVGENHVVKVADFGLSRLMTGDTYTAHAGAKFPIKWTAPESL AFNTFSSKSDVWAFGVLLWEIATYGMAPYPGVELSNVYGLLENGFRMDGPQGCPPSVYRLMLQCWNWSPSDRPRFRDIHF |!||||.|||||||||||||||||||:||||!!||:||:|||:|!||!.|:||||:||:||..||:|||:|||.|.!.| AYNTFSIKSDVWAFGVLLWEIATYGMSPYPGIDLSQVYDLLEKGYRMEQPEGCPPKVYELMRACWKWSPADRPSFAETHQ NLENLISSNSLNDEVQKQLKKNNDKKLESDKRRSNVRERSDSKSRHSSHHDRDRDRESLHSRNSNPEIPNRSFIRTDDSV .:|:!:.::|!:!||.::|.!..:::........ .. .|::!:.:::.::!!: :.:.::.:::::....:|||..::. AFETMFHDSSISEEVAEELGRAASSSSVVPYLPRLPILPSKTRTLKKQVENKENIEGAQDATENSASSLAPGFIRGAQAS S | S
The similarity score has climbed by more than 200 points (it is now
10133.05 more likely these sequences share a common ancestor than
being simply a random alignment). Comparing the two alignments, one
can see that the first two gaps (plus three extra bases of low quality
alignment) have been removed. The two gaps in the orginal alignment
created with GlobalAlign where of lengths 91 and 37respectively. These contributed