concerning end gaps and anchoring

Doug Eernisse Doug_Ee at um.cc.umich.edu
Fri Sep 3 12:35:32 EST 1993


In article <930903084111.20202612 at BOBCAT.CSC.WSU.EDU> 
Steve Thompson: VADMS genetics, THOMPSON at WSUVMS1.CSC.WSU.EDU writes:
>software.  If you know, by eye or otherwise, where the motifs are that
you want
>to force the alignment around, you can add foreign symbols to the
sequence at
>corresponding sites in all members of the group.  This works best if you
can
>flank your known motif with the foreign symbol but also works if you just
>insert it into a common feature (e.g. this works great for absolutely
locking
>in disulphide bridges with protein alignments).  Then you need to modify
the
>substitution matrix which the program accesses to likewise add the
foreign
>symbol.  Give it a substitution value at least 10X that of identity for
your
>table.  Then when you run the program be sure and specify the alternate
table. 
>This works very well for many situations, both nucleotide and peptide,
and has
>been successfully used after my suggestion by many of my users to align
>previosly "unalignable" sequence sets.  Naturally, use an editor to
remove the
>foreign symbols after the alignment has been completed.  Give it a try.
>
>                                                   Steve Thompson


Right, this is similar to what I have done, although your manipulation of
the substitution matrix for amino acids is a very nice touch. If you just
want to try adding columns of a special symbol in your alignment and, like
me, you are using a Mac to edit your alignments, you might find the
following 
tip useful. I have found the freeware version (2.22) of the text editor
BBEdit,
which is one "Child Apps" which comes with Don Gilbert's SeqApp program,
to
be useful in this particular case. Actually, you need to download one of
the
many available pd BBEdit extensions written by other authors. I got this
one at "mac.archive.umich.edu" (anonymous ftp _after_ business hours) but
it should also be at Sumex and the various mirrors to these sites. On the
Michigan archives, look in /util/text/ for something like 
"BBE_InsertColumns.hqx" which simply allows you to insert columns of tabs 
in your data (you can also get the full version of BBEdit 2.22 while you 
are there). I have made trivial changes to the Think C code included,
changed 
the name, and recompiled to make other versions for specific characters
(e.g.,
my gap symbol, space, "$", or whatever), but it is also easy enough to
globally
change all the tabs you inserted in your alignment to be "$$$$$$" or
whatever. 
These may be stripped in a similar global manner using the built-in
"Replace"
facility.
 
As a general comment, one should be wary of such a method because it is
exceedingly difficult to limit alignment ambiguities to particular
columns.
The same applies to those who put bars over "ambiguous" alignment sites
which are then excluded from a phylogenetic analysis. The problem is,
alternative gap placements may extend into or out of the sites one would
like to treat in a special manner. At least keep those ambiguous sites in
your alignment, perhaps excluding them from some analyses by defining
a "charset" of those sites (PAUP). If you don't you risk losing track of
your
decisions on alignment. Steve's case of lacking disulphide bridges might
be an appropriate a priori justification for deciding where the gaps
should 
go.

Doug




More information about the Bio-soft mailing list