Sellers Pattern-matching algorithm
allen at sparc.brc.ubc.ca
Mon Mar 22 11:23:21 EST 1993
The concept of distance is non-directional, i.e. d(a,b) = d(b,a).
This is not necessarily what you want in a pattern matching algorithm.
When searching a data base for a target, you want to penalize for
a longer probe than a target, otherwise every short sequence in the
data base will give you a low-distance match, drowning out all the
interesting matches. You do, however, want to see matches where your
probe sequence matches a subsequence of a longer target sequence.
The algorithm you are using seems to be designed with this in mind, with
no penalty for terminal target sequence gaps. If the algorithm is adjusted
to have the same penalty for all terminal gaps, then inserting "confused
PhD" will have the same penalty as deleteing "confusedPhD", and d(a,b)=d(b,a).
Even in evolutionary trees there is directionality, ancestor --> descendant.
Most parsimony algorithms assume non-directional distance, but I don't know
that this reflects reality.
To obtain non-directional distances:
1) use D = min( d(a,b),d(b,a))
2) use if length(a) < length(b) then D=d(a,b) else D=d(b,a)
3) fix the algorithm terminal gap penalties.
Allen Delaney (allen at brc.ubc.ca)
Biomedical Research Centre, U.B.C., Vancouver, B.C. Canada, V6T 1Z3
More information about the Bio-soft