IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

dna versus protein=20

Brian Foley btf at t10.lanl.gov
Fri Jan 16 16:08:33 EST 1998

Maria Silvina Fornasari wrote:
> I would like to know if anyone knows about any paper related 
> with the reliability to infer a given phylogeny using dna
> or protein sequences.
Leitner T, et al.   
    Accurate reconstruction of a known HIV-1 transmission history by
phylogenetic tree analysis. 
    Proc Natl Acad Sci U S A. 1996 Oct 1; 93(20): 10864-10869. 
    PMID: 8855273; UI: 97008097.

> Another question is about how can I estimate the saturation 
> of sites using protein sequences. Is it enough to say that 
> they are above 25% of identity to be sure that they are not 
> saturated?

	I don't think so.  Protein coding regions can be saturated 
with mutations and still have a very high level of sequence
identity due to purifying selection.  Some regions of the HIV
genome encode 3 different proteins (one in each of the 3 forward
frames) so there are no "silent" sites.  Even in regions in
which only one frame is used, there are evolutionary constraints
on the protein, constraints on overall nucleotide ratio (HIV is
A-rich and C-poor), constraints on CpG DNA methylation sites, and
other constraints.
	Total accumulation of mutations is not time-linear.  We
see approximatley 1% evolution per year, but after 50 years no
two genomes are 50% divergent.  Variable sites flip-flop between
all 4 bases at high rate, while selection weeds out any mutation
at other sites.

|Brian T. Foley               btf at t10.lanl.gov                       |
|HIV Database                 (505) 665-1970                         |
|Los Alamos National Lab      http://hiv-web.lanl.gov/index.html     |
|Los Alamos, NM 87544  U.S.A. http://www.t10.lanl.gov/~btf/home.html |

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net