IUBio Biosequences .. Software .. Molbio soft .. Network News .. FTP

Why look at G+C content?

Jean Lobry lobry at evol10.univ-lyon1.fr
Tue Mar 12 05:30:02 EST 1996

In article 674 at phelix.umd.edu, moths at Glue.umd.edu (Andrew Mitchell) writes:
>As we are all aware, base composition biases can seriously affect 
>phylogenetic analyses of DNA sequence data.   I have seen many papers in 
>which such biases are assessed by examining the G+C content of 
>sequences.  If this value is approximately 50% then authors conclude 
>there is no base composition bias.  However, that 50% G+C could break 
>down further into 45% G, 5% C, 10% A and 40% T - extreme composition 
>bias.  So why the fixation with G+C content?  Is it simply a hangover 
>from the days before DNA sequencing, or did I miss something?
>Andrew Mitchell

Dear Andrew,

your question is interesting, but the amswer is complex.

First we have to consider what Chargaff found exactly,
and not only what was retained later.

Here is a citation from Chargaff, E. (1979) How genetics
got a chemical Education. Ann. NY Acad. Sci. 325:345-360.

"The relationships in DNA which probably contributed a 
great deal to the chemical education of biologists, are
as follows. (1) A+G=T+C; (2) A=T; (3) G=C; and as a
logical consequence of these three equations: (4) A+C=
G+T, i.e., the sum of the 6-amino compounds equals that
of the 6-oxo derivatives. The last-mentioned regularity,
the equality of 6-amino and 6-oxo compounds, also applies,
in the absence of the other regularities, to the total
RNA of a cell. Not unrelated to this as yet unexplained
finding may be the later observations from my laboratory,
namely, that in microbial DNA the separated heavy and
light strands, although complementary to each other with
respect to base composition, both exhibit the same
equivalence of 6-amino and 6-oxo bases.
To my knowledge, there have been no follow-up studies of
the last-mentioned observations in other laboratories."

Surprising isn't it ? Chargaff also found regularities in base
composition of single-stranded DNA, but this point seems
to be lost now, perhaps because the A=T and C=G
equalities for double-stranded DNA are so nicely explained
by Watson-Crick base pairing rules.

Now that long steches of DNA sequences are available we
know that, in general, Chargaff's rules also apply to
single-stranded DNA. Have a look to Fig 2 in my paper
in J. Mol. Evol. (1995) 40:326-330 ; the equalities
A=T and C=G for ssDNA are stiking. (there are of course
exceptions, especially for small mitochondrial genomes).

Why should Chargaff's rules also apply to ssDNA ? 
As far as I know, there are two explanations, one
selectionist and one neutralist.

The selectionist interpretation

This is from a paper by Forsdyke, D.R. (1995) J. Mol. Evol.

"It is proposed that Chargaff's rule applies to single-stranded
DNA because there has been an evolutionary selection pressure
favoring mutations that generate complementary oligonucleotides
in close proximity, thus creating a potential to form stem-loops."

If you admit that ALL nucleotides in DNA are involved in
stem-loops, then this is a good interpretation for you.
If you are reluctant to think that ALL nucleotides are
involved in stem-loops...well...the neutralist interpretation
is for you 8-)

The neutralist interpretation

Under no-strand-bias conditions, that is when there are
no strand bias for BOTH the mutation and selective
processes the model for the substitution of bases
reduced to a 6-paramter model. See the paper by
Sueoka, N. (1995) J. Mol. Evol. 40:318-325 for a
detailed explanation of how this 6-paramter model is
obtained. This model predict that at equilibrium
Chargaff's rules also apply to ssDNA.

So, back to your original question : Why the fixation with
the G+C content ?

Under no-strand-bias condition this is the only mutational
pressure that is possible, dealing only with G+C content is

When the no-strand-bias conditions are violated (e.g. in
small mitochondrial genomes) dealing only with G+C content
is not legitimate.

The apparent fixation with the G+C content is a consequence
that in most cases we are working under no-strand-bias conditions.


Jean R. Lobry, URA - CNRS 243, Biometry - Genetics and Population Biology
Univ. C. Bernard - LYON I, 43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX
phone  : (33) 72 43 12 87 
fax    : (33) 78 89 27 19             e-mail:lobry at biomserv.univ-lyon1.fr

More information about the Mol-evol mailing list

Send comments to us at biosci-help [At] net.bio.net