concerning end gaps and anchoring
Steve Thompson: VADMS genetics
THOMPSON at WSUVMS1.CSC.WSU.EDU
Fri Sep 3 10:41:11 EST 1993
Hello Bio-Soft -
In Message-Id <9309031002.AA21206 at net.bio.net> Doug Eernisse, (Hi Doug!)
Doug_Ee at um.cc.umich.edu, quotes Wagner Fontes and proposes a solution:
>In article <9309021947.AA01103 at net.bio.net> Wagner Fontes,
>WAGNERF at BRUNB.BITNET writes:
>>I'm trying to align peptides using Clustal V, but found a problem:
>>With a low gap penalty, to allow the peptide to be inserted anywhere
>>in the sequence, the software inserts long gaps IN the peptide.
>>Is there a way to increase the gap penalty only for the internal gaps,
>>allowing long gaps at the beginning and/or end of the sequence?
>I think this is a general problem for multiple sequence alignments
>in which the sequences are poorly conserved or contain missing
>data near some of their ends. We need algorithms that allow anchoring
>of portions of the alignment. Is there a way you can artificially
>make the ends more highly matching (temporarily) for the purpose
>of matching the central portions, then remove the central portion
>back to the unmodified alignment? This problem is why I have
>never gotten very far with a nonmanual approach to aligning
>mixtures of partial and complete 18S rRNA sequences.
This is a crucial problem in all alignments, especially multiple ones.
Wagner's dismay is about my only complaint with Clustal V --- it has no feature
that I am aware of to allow the program to ignore end gap weighting. GCG's
PileUp, which pretty much is the same type of algorithm, by default ignores end
gap weights and, therefore, is much more appropriate in instances where all
sequences do not begin at a common site. PileUp also allows one to specify end
gap weighting if desired.
Doug's point, however, raises an entirely different issue. How can a program
find and align strong "motifs" within sequences if the overall similarity is
very low? Supposedly PIMA works well at this but I have not had a chance to
play with it since I do not have a UNIX station at my disposal. However, I
have used a TRICK in forcing this type of alignment to work with other
software. If you know, by eye or otherwise, where the motifs are that you want
to force the alignment around, you can add foreign symbols to the sequence at
corresponding sites in all members of the group. This works best if you can
flank your known motif with the foreign symbol but also works if you just
insert it into a common feature (e.g. this works great for absolutely locking
in disulphide bridges with protein alignments). Then you need to modify the
substitution matrix which the program accesses to likewise add the foreign
symbol. Give it a substitution value at least 10X that of identity for your
table. Then when you run the program be sure and specify the alternate table.
This works very well for many situations, both nucleotide and peptide, and has
been successfully used after my suggestion by many of my users to align
previosly "unalignable" sequence sets. Naturally, use an editor to remove the
foreign symbols after the alignment has been completed. Give it a try.
Steven M. Thompson
Consultant in Molecular Genetics and Sequence Analysis
VADMS (Visualization, Analysis & Design in the Molecular Sciences) Laboratory
Washington State University, Pullman, WA 99164-1224, USA
AT&Tnet: (509) 335-0533 or 335-3179 FAX: (509) 335-0540
BITnet: THOMPSON at WSUVMS1 or STEVET at WSUVM1
INTERnet: THOMPSON at wsuvms1.csc.wsu.edu
More information about the Bio-soft