Here is another paper (from our lab) that shows a correlation between
protein composition and nucleotide composition.
J Mol Evol 1997 Mar;44(3):282-8
Nucleotide composition bias affects amino acid content in proteins coded by
Foster PG, Jermiin LS, Hickey DA
Department of Biology, University of Ottawa, Ontario, Canada.
We show that in animal mitochondria homologous genes that differ in guanine
plus cytosine (G + C) content code for proteins differing in amino acid
content in a manner that relates to the G + C content of the codons. DNA
sequences were analyzed using square plots, a new method that combines
graphical visualization and statistical analysis of compositional
differences in both DNA and protein. Square plots divide codons into four
groups based on first and second position A + T (adenine plus thymine) and G
+ C content and indicate differences in amino acid content when comparing
sequences that differ in G + C content. When sequences are compared using
these plots, the amino acid content is shown to correlate with the
nucleotide bias of the genes. This amino acid effect is shown in all
protein-coding genes in the mitochondrial genome, including cox I, cox II,
and cyt b, mitochondrial genes which are commonly used for phylogenetic
studies. Furthermore, nucleotide content differences are shown to affect the
content of all amino acids with A + T- and G + C-rich codons. We speculate
that phylogenetic analysis of genes so affected may tend erroneously to
indicate relatedness (or lack thereof) based only on amino acid content.
Jean Lobry wrote in message <714r9s$pv7 at net.bio.net>...
>On Wed, 21 Oct 1998 17:53:14, Dr.Ram Samudrala
<ram.samudrala at stanford.nojunkemail>
>>>> Have there been any estimates published on the frequency/distribution
>> of nucleotides/amino acids in DNA/protein sequences PRIOR to
>> selection, i.e., before selection kicks in? I can get the frequencies
>> of amino acids in known protein sequences, but these proteins have
>> generally arisen through natural selection.
>>>>I'am not sure that this answers your question but in
>> Lobry, J.R. (1997) Influence of genomic G+C content on average amino-acid
> composition of proteins from 59 bacterial species. Gene 205:309-316
>>you find how to compute theoretical amino-acid frequencies as function of
>nucleotidic frequencies in absence of selection. This is not difficult,
>you just have to take care that there are no stop codons in coding
>Jean R. Lobry (lobry at biomserv.univ-lyon1.fr)
>Laboratoire BGBP-CNRS-UMR-5558, Univ. C. Bernard - LYON I,
>43 Bd 11/11/1918, F-69622 VILLEURBANNE CEDEX, FRANCE
>phone : (33) 472 43 12 87 fax : (33) 478 89 27 19