In article <4hmte8$674 at phelix.umd.edu> moths at Glue.umd.edu (Andrew Mitchell) writes:
>As we are all aware, base composition biases can seriously affect
>phylogenetic analyses of DNA sequence data. I have seen many papers in
>which such biases are assessed by examining the G+C content of
>sequences. If this value is approximately 50% then authors conclude
>there is no base composition bias. However, that 50% G+C could break
>down further into 45% G, 5% C, 10% A and 40% T - extreme composition
>bias. So why the fixation with G+C content? Is it simply a hangover
>from the days before DNA sequencing, or did I miss something?
The genome as a whole, if it is base-paired, must have G=C and A=T,
so the only mechanism that could produce G<>C would be one that was
specific to the coding strand (i.e. the sequences you are looking at
have more G than C; their complements on the non-coding strand have more
C than G). Most forms of mutation seem unlikely to know which strand is
which, though I suppose a mutation mechanism related to transcription and
thus acting on the transcribed strand only is possible. (Something like
"If you transcribe through a C it may turn to G" would eventually lead
to a low proportion of C relative to G on the coding strand.)
In organisms such as RNA viruses where the genome is not base-paired you
could indeed have arbitrary nucleotide composition bias.
Mary Kuhner mkkuhner at genetics.washington.edu