PHYLIP (DNAML) Question
James O. McInerney PhD
j.mcinerney at nhm.ac.uk
Tue Jan 23 07:51:44 EST 1996
>I have a question for any of you who are familiar with the use of maximum
>likelihood analysis in Felsenstein's PHYLIP package: How should I set
>the "categories" option for defining how many categories of substitution
>sites there are and the relative rates for each? I'm doing a DNAML and
>DNAMLK analysis of coding regions, so I was thinking that there should be
>2 categories of rates, one for the synonymous substitutions and one for
>the non-synonymous substitutions. An average value for this ratio (for
>42 genes, Nei, 1987, p.80) is about 5.3:1. Should I use this ratio, or
>should I empirically determine what it is for my sequences. If I decide
>to determine this, what program might I use?
Rates of synonymous and non-synonymous substitutions vary quite
considerably depending on one or more of a number of factors. The base
composition of the genome (either as a whole or in the immediate vicinity)
may have an effect on the rate of synonymous base changes. If, say, a
bacterium has a mutational bias towards a high G+C content, then this may
keep the GC3s (GC content at the third position of synonymous codons) very
high. If all the close relatives of this organism also have high G+C
genomes, then the amount of synonymous changes between each of these genes
may be fairly low. If, on the other hand there is a drift away from high
GC3s values then the rate of synonymous substitution may increase.
In organisms that have had a large long-term efective population size
(which I won't get into) then the highly expressed genes are generally
restricted to an 'optimal' subset of codons. Evolutionary rates are
(generally) lowest in these kinds of genes.
Non-synonymous substitutions are usually a function of how conserved the
gene may be. A highly conserved gene, where the amino acids do not change
substantially over time, will obviously have a low rate of non-synonymous
Paul Sharp, Ken Wolfe, Wen-Hsiung Li and various other workers have worked
on this subject quite a lot. See the end of the message for a (short)
I haven't exhausted all the instances or caveats that pertain to this
question but perhaps this will put you in the picture. For a
program...Yasuo Ina says to send him an email:
yina at ddbj.nig.ac.jp
This is on the last page of his paper (see biblio.).
Sharp, P.M., Shields, D.C., Wolfe, K.H., Li, W.-H.,
Chromosomal location and evolutionary rate variation in enterobacterial
genes. 1989 Science vol. 246 pp. 808-810.
Processes of genome evolution reflected by base frequency differences among
Serrata marcessans genes. 1990 Molecular Microbiology 4(1), 119-122.
Wolfe, K.H., Sharp, P.M., Li, W.-H.,
Mutation rates differ among regions of the mammalian genome.
1989 Nature vol. 337 pp 283-285
New methods for estimating the numbers of synonymous and nonsynonymous
1995 J. Mol. Evol. 40 pp 190-226
Hope this helps,
James O. McInerney PhD email: J.mcinerney at nhm.ac.uk
Senior Scientific Officer, phone: +44 171 938 9247
Department of Zoology,
The Natural History Museum,
London SW7 5BD.
More information about the Mol-evol