James O. McInerney PhD j.mcinerney at
Tue Jan 23 07:51:44 EST 1996

>I have a question for any of you who are familiar with the use of maximum
>likelihood analysis in Felsenstein's PHYLIP package:  How should I set
>the "categories" option for defining how many categories of substitution
>sites there are and the relative rates for each?  I'm doing a DNAML and
>DNAMLK analysis of coding regions, so I was thinking that there should be
>2 categories of rates, one for the synonymous substitutions and one for
>the non-synonymous substitutions.  An average value for this ratio (for
>42 genes, Nei, 1987, p.80) is about 5.3:1.  Should I use this ratio, or
>should I empirically determine what it is for my sequences.  If I decide
>to determine this, what program might I use?


Rates of synonymous and non-synonymous substitutions vary quite
considerably depending on one or more of a number of factors.  The base
composition of the genome (either as a whole or in the immediate vicinity)
may have an effect on the rate of synonymous base changes.  If, say, a
bacterium has a mutational bias towards a high G+C content, then this may
keep the GC3s (GC content at the third position of synonymous codons) very
high.  If all the close relatives of this organism also have high G+C
genomes, then the amount of synonymous changes between each of these genes
may be fairly low.  If, on the other hand there is a drift away from high
GC3s values then the rate of synonymous substitution may increase.

In organisms that have had a large long-term efective population size
(which I won't get into) then the highly expressed genes are generally
restricted to an 'optimal' subset of codons.  Evolutionary rates are
(generally) lowest in these kinds of genes.

Non-synonymous substitutions are usually a function of how conserved the
gene may be.  A highly conserved gene, where the amino acids do not change
substantially over time, will obviously have a low rate of non-synonymous

Paul Sharp, Ken Wolfe, Wen-Hsiung Li and various other workers have worked
on this subject quite a lot.  See the end of the message for a (short)

I haven't exhausted all the instances or caveats that pertain to this
question but perhaps this will put you in the picture.  For a
program...Yasuo Ina says to send him an email:

yina at

This is on the last page of his paper (see biblio.).


Sharp, P.M., Shields, D.C., Wolfe, K.H., Li, W.-H.,
Chromosomal location and evolutionary rate variation in enterobacterial
genes.  1989 Science vol. 246 pp. 808-810.

Sharp, P.M.,
Processes of genome evolution reflected by base frequency differences among
Serrata marcessans genes. 1990 Molecular Microbiology 4(1), 119-122.

Wolfe, K.H., Sharp, P.M., Li, W.-H.,
Mutation rates differ among regions of the mammalian genome.
1989 Nature vol. 337 pp 283-285

Ina, Y.,
New methods for estimating the numbers of synonymous and nonsynonymous
1995 J. Mol. Evol. 40 pp 190-226

Hope this helps,


James O. McInerney PhD           email: J.mcinerney at
Senior Scientific Officer,       phone: +44 171 938 9247
Department of Zoology,
The Natural History Museum,
Cromwell Road,
London SW7 5BD.

More information about the Mol-evol mailing list