Why look at G+C content?

Warren Gallin wgallin at gpu.srv.ualberta.ca
Fri Mar 8 18:56:42 EST 1996


In Article <Pine.SUN.3.91.960308072543.18505B-100000 at mcz>,
dmw at MCZ.HARVARD.EDU (Daniel Weinreich) wrote:
>mkkuhner at genetics.washington.edu, wgallin at gpu.srv.ualberta.ca and
>galtier at acnuc.univ-lyon1.fr all wrote approximately:
>
[previous stuff deleted]
>
>Dear Andrew,
>
>I'm with you and must respectfully disagree with Mary, Warren and Nicolas. 
>Certainly genomic G=C and A=T so long as W-C base pairing obtains.  But
>our analyses focus on only one strand (generally the coding strand, though
>except for the translation to amino acids, that's arbitrary), whose
>A/C/G/T composition can in principle be anything. 
>
>Consider primate mtDNA.  For some reason, protein-coding sequences have
>roughly equal A and C content, much reduced T content, and nearly no G's. 
>For example, among 6 mtDNA genes from 5 primates, mean percent
>compositions are A = 37%, C = 39%, T = 19% and G = 5%.
>
>And I think your original point is valid: most of our favorite estimation
>programs (be they for phylogeny or substitution rate estimation) are quite
>sensitive to underlying base frequencies ON ONE STRAND.  I believe that's
>the point of, for example, Kondo et al, JME 36:517 and Perna and Kocher,
>MBE 12:359.
>
>Or maybe I missed something!
>Dan.

At the risk of going out on a limb, I think we may be talking apples and
oranges here.  The use of G+C evaluates an overall compositional bias in the
genome, which I suppose might be based on the need for a specific  base
composition to maintain appropriate melting behaviour under different
physiological conditions among other things. So that is one kind of correction.

I think that the Perna and Kocher paper is addressing an artefect of using a
parsimony analysis to generate a substitution matrix, and that is affected
by the composition of a single strand.

What I am not clear on is whether these are two seperate issues.  Any more
insight out there?



 that the problem with biases that needs to be corrected would be based on
things like different G+C compositions required to keep the melting
behaviour constant under the different physiologicla conditions of different
organisms, or some other process that will bias the composition of the DNA
in the face of mutational events.  Although we choose to analyze one strand
of a double helix, it seems to me that we could equally well perform the
analysis on the other strand, since the mutatioanl events we are looking at
affect both strands comparably.  The underlying base frequencies on one
strand will be affected 
Warren Gallin,
Department of Biological Sciences, University of Alberta
wgallin at gpu.srv.ualberta.ca



More information about the Mol-evol mailing list