Why look at G+C content?
Fri Mar 8 02:41:22 EST 1996
In article 674 at phelix.umd.edu, moths at Glue.umd.edu (Andrew Mitchell) writes:
> As we are all aware, base composition biases can seriously affect
> phylogenetic analyses of DNA sequence data. I have seen many papers in
> which such biases are assessed by examining the G+C content of
> sequences. If this value is approximately 50% then authors conclude
> there is no base composition bias. However, that 50% G+C could break
> down further into 45% G, 5% C, 10% A and 40% T - extreme composition
> bias. So why the fixation with G+C content? Is it simply a hangover
> from the days before DNA sequencing, or did I miss something?
> Andrew Mitchell
I agree with M. Kuhner and W.Gallin about Chargaff's rules. For a
theoritical discussion on this subject, see papers of Lobry (J. Mol. Evol.
and Mol. Biol. Evol. 1994-95).
I can further answer as a pratician : in actual sequences, G% and C% are
highly correlated. Sometimes, you encounter a sequence with unusually high
A, C, G or T content, but most of the variability in base composition is
well decribed by GC%. That's why many (but not all) evolutionary models
assume A=T, C=G in a given DNA strand : 3 compositional parameters can be
seen as too high a cost to represent base composition.
More information about the Mol-evol