Why look at G+C content?

Mary K. Kuhner mkkuhner at phylo.genetics.washington.edu
Thu Mar 7 20:20:52 EST 1996


In article <4hmte8$674 at phelix.umd.edu> moths at Glue.umd.edu (Andrew Mitchell) writes:
>As we are all aware, base composition biases can seriously affect 
>phylogenetic analyses of DNA sequence data.   I have seen many papers in 
>which such biases are assessed by examining the G+C content of 
>sequences.  If this value is approximately 50% then authors conclude 
>there is no base composition bias.  However, that 50% G+C could break 
>down further into 45% G, 5% C, 10% A and 40% T - extreme composition 
>bias.  So why the fixation with G+C content?  Is it simply a hangover 
>from the days before DNA sequencing, or did I miss something?

The genome as a whole, if it is base-paired, must have G=C and A=T,
so the only mechanism that could produce G<>C would be one that was
specific to the coding strand (i.e. the sequences you are looking at
have more G than C; their complements on the non-coding strand have more
C than G).  Most forms of mutation seem unlikely to know which strand is
which, though I suppose a mutation mechanism related to transcription and
thus acting on the transcribed strand only is possible.  (Something like
"If you transcribe through a C it may turn to G" would eventually lead
to a low proportion of C relative to G on the coding strand.)

In organisms such as RNA viruses where the genome is not base-paired you
could indeed have arbitrary nucleotide composition bias.

Mary Kuhner mkkuhner at genetics.washington.edu



More information about the Mol-evol mailing list