GCG V. 9.0-GAP

Francois JEANMOUGIN pingouin at crystal.u-strasbg.fr
Fri May 30 02:35:00 EST 1997


In article <5mkcsb$hj1 at rc1.vub.ac.be>,
	gbottu at ben.vub.ac.be (Guy Bottu) writes:
> 	Dear colleagues,
> 
> Allow me to add a few comments :
> 
> - the two sequences have no observable similarity and so, if you run gap
> in its default mode (no penalty for terminal gaps), it will give
> an alignment with a tiny overlap, while if you add the parameter
> -endweight it will give an alignment with a crazy number of gaps. In
> both cases the alignment is not biologically relevant anyway.

	For this alignement sure. I'm sorry, I can't find the example
I have of such bad gap output. This was using bromodomain containing
proteins. The GCG8 gap was able to align the bromodomains of the
two proteins, and the GCG9 one made the G-G alignment like shown
in this newsgroup. As you can see at :
	http://www-igbmc.u-strasbg.fr/~pingouin/Bromodomain/update9b.msf
the bromodoamin is conserveed  enough through evolution to be aligned.
I can't explain the way GCG9 work only with the matrix and endweight
new features. I will try to reproduce the bug (I persist), but don't 
have so much time). Also, I don't know how gap interprate the matrix, and
if the shift from a -1 to 1 matrix to a -12 to 12 matrix (or
something like that) can make the algorithm mistake (I'm not clear,
but it's not clear in my brain ;-).

>[...] 
> - Often, there are several alternative alignments with the same highest
> possible score. The question which to choose is a difficult and
> certainly not entirely trivial issue. As far as I know, nobody has
> found a satisfactory answer. Fortunately, the ambiguity is most of the
> time in a few spots of the alignment that have many gap positions and
> it reflects an ambiguity of finding out in which order
> deletion/insertion events occurred in the course of evolution.

	Sure, I'm agree with you. But in the case shown, the second
alignment clearly have a lower score than the first one. Using the same
matrix and the same algorithm, whatever the order of the sequences you
give, you should find the same family of alignment with the same score.
The only difference can be on which alignment of the family you will
show to the user, but in our case, the two alignemnts are clearly not
from the same family.

> So, I would not say that the program gap is bugged or misfeatured (what
> we cannot say of a lot of other parts of GCG (both  :) and :(  !!)).  

<JOKE> well, hem..., use ClustalX ;-))))</JOKE>

	I persist, and will try to demonstrate with another example (if 
you can wait some days).

> Also, I find no reason to alias gap to gap -endweight or to return to
> the renormalized PAM250 matrix as a default, but it is very useful to
> give the users an elementary explanation of alignment algorithms and
> scoring schemes, so that they know what they are using.

	Ok, I 'm agree with that, but... there is a bug (for me)...

							Francois.
-- 
Francois Jeanmougin     | groupe de bioinformatique / bioinformatics groupe
tel:(+33) 3 88 65 32 71 | IGBMC BP 163 67404 Illkirch France
e-mail : jeanmougin at igbmc.u-strasbg.fr
"C'est pas parcequ'on monte au banc, qu'il faut descendre a jeun."(Thiefaine)




More information about the Bio-soft mailing list