energy in sequence matching

Tom Holroyd tomh at BAMBI.CCS.FAU.EDU
Tue Oct 1 13:02:21 EST 1991


Energy here is a thermodynamic equilibrium energy.

Humans don't often compare long sequences by hand, they use a
computer program that minimizes some distance function, analogous
to the process of finding the minimum energy configuration if the
two DNA sequences were to coil up with each other.  DNA hybridization
uses 'annealing' - heat it up, cool it down, heat it up, cool it down,
etc., each time reducing the max heat, until eventually it has settled
into a minimal (hopefully) energy configuration, with the fewest
possible mismatches.  The algorithms do something similar, with a
distance function computed from the number of mismatches, which
is minimized..

So, the 'match' between two sequences that you see published is
really one match out of several low energy (small number of mismatches)
configurations.  And the algorithms have varying penalties for
mismatches, so changing a penalty for an insert as compared with
the penalty for a change (like CCC vs. CCT is less of a difference
than CCC vs. CGCC, say), can sometimes drastically alter the
sequence match.

Just as in real hybridization, there isn't ONE configuration that
is always obtained, there may be several good matches.  Even
proteins, I think, have multiple secondary and tertiary configurations.

My experience with this is more generic pattern matching technique,
not specifically DNA matching - so if there's anybody out there who
wants to describe sequence matching in detail, maybe this is a good
time.

Tom Holroyd
Center for Complex Systems
Florida Atlantic University
tomh at bambi.ccs.fau.edu




More information about the Mol-evol mailing list