I'd like to bring some more minds in on an ongoing discussion here:
What makes a good match between NUCLEIC ACIDS? I ask about DNA
to eliminate the discussion of PAM scores and likely mutations.
If one is doing an error-tolerant comparison of strings that
SHOULD match exactly (as is the case when doing plain text searches
or sequencing fragment assembly) how should one balance length
of match against percent match? Is an exact match of 20 bases
better than a 96% exact match of 25 bases? I have seen heuristics
used for this decision, but have never seen any of them backed
up with much discussion.
If you want to mail to me directly, I will post a summary to the
cash at csmil.umich.edu