Next: Probability foundations
Up: Molecular Weight Traces
Previous: Molecular Weight Traces
If our mass measures were perfect and our sequence database
contained all searched sequences, this would be an
almost trivial problem. Search a vector of
weights against all possible vectors of weights computed
from the sequence database.
This problem is known in computer science as multidimensional
search.
This is, unfortunately, not the case for the following reasons:
- (a)
- The recording of molecular mass is subject to a relative
error, in general less than
1% but not exact enough as
to identify even very short sequences of amino acids.
- (b)
- The searched sequence may not be verbatim in the database,
maybe a close relative of the sequence is.
In this case the searched sequence and the target could differ
due to mutations, insertions and deletions.
This will cause some molecular weights to be different.
- (c)
- The mutations in the database sequence can cause
the digestion to be different, splitting into more
or fewer fragments.
This will cause a complete mismatch of weights
involving such fragments.
- (d)
- Impurities in the sample and in the digesters may produce
spurious data in the searched sample.
- (e)
- The fragmentation (digestion) although in general accurate,
is not 100% deterministic.
Partial digestion or incorrect ones are also possible.
For all these reasons we have to choose a
matching method which will tolerate errors both in the sample
and in the database.
The algorithm we will use, for a single digestive enzyme,
can be stated in relatively simple terms:
- (i)
- Find a set of molecular weights of the digested protein
(usually found by experimental means).
- (ii)
- Digest (theoretically) every sequence in the database
and find the molecular weights of the fragments.
- (iii)
- Compute the probability that a match of the given weights
against the computed ones happens at random.
- (iv)
- Record the m lowest probabilities.
This algorithm returns the m most likely candidate
sequences from the database.
Analysis of these sequences and their probabilities will
normally reveal whether we have found a match, a hint
or just random noise.
Next: Probability foundations
Up: Molecular Weight Traces
Previous: Molecular Weight Traces
Gaston Gonnet
1998-09-15