PHYLIP protdist with chemical categories.

Brian Foley btf at t10.lanl.gov
Tue May 12 12:43:10 EST 1998

Dear Molecular Evolutionists,

	I have come across a strange result with the PHYLIP
protdist program, using HIV-1 gene and protien sequences and
the protdist option to use a chemical categories model of 
protein evolution.  The result is that I get roughly 2X the
evolutionary distance using the chemical categories model
that I get using either protdist with the PAM model or
dnadist using the Kimura model.
	My gut feeling is that the protdist chemical categories
model is simply doubling the distances for some reason.  But
the more exciting/interesting result would be that HIV-1
really has a strange evolution pattern such that DNA evolution
and protein evolution by a PAM modle are pretty much normal,
but HIV-1 changes chemical categories so frequently when it
changes amino acids, that it shows an anomylously high 
rate of evolution in the chemical categories model.
	I have tested the programs on HIV-1 Env and Pol 
genes, which have very different evolutionary rates and
pressures (Env is under pressure to change and avoid the
immune system; Pol is under pressure to remain highly
functional, conserved) and in each case protdist with the
chemical categories model gives 2X or slightly more than 2X
the values given by dnadist and protdist with a PAM model
(and those two are in close agreement with each other).
	I would now like to find a DNA and protein alignment
from mammals or bacteria or other organisms to see if
the chemical categories model just always gives twice the
score of other methods, or is it something about HIV-1 that
causes this model to give a two-fold overestimate of the
distance between sequences.  Does anyone have a data set
pre-aligned on the WWW or an FTP site?

	Here are some selected values for Pol:

				PAM	 Chemical
Sequence pair		dnadist	protdist protdist 
--------------		------- -------  ------
A-SF1703vsD-Z2Z6	0.0733	0.07366	0.16413
A-SF1703vs84-USHAWM	0.1105	0.10611	0.24337		
78-US4380vs90-USYU2	0.0371	0.05179	0.10764
84-CACAN0vs90-USYU2	0.0392	0.05376	0.10738
96-USC07Dvs95-US613P	0.1120	0.10315	0.24611

and for Env:

				PAM 	  chemical 
seqeunce pair		dnadist protdist  protdist
-------------		-------	--------  ---------
88-USSF19vs89-USSF20	0.168	0.17929	  0.49718	
88-USSF19vs84-USRJS4	0.259	0.28936	  0.69735
89-USSF20vs84-USRJS4	0.224	0.25852	  0.65455
86-USSF14vs84-USRJS4	0.196	0.21372	  0.51492	
CONSENSUSvs87-USACH9	0.091	0.09212	  0.25488	
CONSENSUSvs86-USSF8	0.091	0.08757   0.15892	

