mkkuhner at genetics.washington.edu, wgallin at gpu.srv.ualberta.ca and
galtier at acnuc.univ-lyon1.fr all wrote approximately:
> In article <4hmte8$674 at phelix.umd.edu> moths at Glue.umd.edu (Andrew Mitchell) writes:
> >As we are all aware, base composition biases can seriously affect
> >phylogenetic analyses of DNA sequence data. I have seen many papers in
> >which such biases are assessed by examining the G+C content of
> >sequences. If this value is approximately 50% then authors conclude
> >there is no base composition bias. However, that 50% G+C could break
> >down further into 45% G, 5% C, 10% A and 40% T - extreme composition
> >bias. So why the fixation with G+C content? Is it simply a hangover
> >from the days before DNA sequencing, or did I miss something?
>> Chargaff's corollary to Watson-Crick base pairing requires that G=C and
I'm with you and must respectfully disagree with Mary, Warren and Nicolas.
Certainly genomic G=C and A=T so long as W-C base pairing obtains. But
our analyses focus on only one strand (generally the coding strand, though
except for the translation to amino acids, that's arbitrary), whose
A/C/G/T composition can in principle be anything.
Consider primate mtDNA. For some reason, protein-coding sequences have
roughly equal A and C content, much reduced T content, and nearly no G's.
For example, among 6 mtDNA genes from 5 primates, mean percent
compositions are A = 37%, C = 39%, T = 19% and G = 5%.
And I think your original point is valid: most of our favorite estimation
programs (be they for phylogeny or substitution rate estimation) are quite
sensitive to underlying base frequencies ON ONE STRAND. I believe that's
the point of, for example, Kondo et al, JME 36:517 and Perna and Kocher,
Or maybe I missed something!
Daniel M. Weinreich email: dmw at mcz.harvard.edu
Harvard University usmail: 26 Oxford Street
Museum of Comparative Zoology Cambridge, MA 02138
voice: (617) 495-1954 fax: (617) 495-5846