finding distances for unrelated sequences

Thomas Isenbarger isen at plantpath.wisc.edu
Wed Dec 8 15:50:45 EST 2004


I want to use some sort of clustering method (multidimensional scaling
or stochastic proximity embedding, for instance) to group nucleotide
sequences into clouds of similar sequences on a 2-D plot.  These methods
require a "dissimilarity matrix", which as far as I can tell is the same
as a distance matrix (high scores mean less similarity).

I have a set of 700+ sequences that I want to group this way, but the
set:

1.  contains some homologous groups, but
2.  these groups are unrelated, and
3.  the sequences are of different lengths

If the sequences were related and could be trimmed to the same length, I
would do an alignment and then use phylip to create a distance matrix,
but since my sequences are unrelated and cannot really be trimmed to the
same length, I am at a loss for what to do.

For a set with so many unrelated sequences of different lengths, the
only thing I have been able to think of is an all-against-all BLAST to
create a score matrix using the normalised bits score, but this gives
high scores for similarities.  From there, the only thought I had was to
use the reciprocal of the BLAST score as some perverse measure of
distance.

Any ideas?

please email to isen AT plantpath DOT wisc DOT edu

Cheers,
Tom Isenbarger
---



More information about the Mol-evol mailing list