codon frequency of C. elegans vs. E. coli

Peter Rice pmr at sanger.ac.uk
Fri Jun 2 07:38:55 EST 1995


------------------------------------------------------------------------
In article <D9I86I.GCn at ncifcrf.gov> pnh at fcs260c2.ncifcrf.gov (Paul N Hengen) writes:
>   What is the difference between the codon frequencies of C. elegans vs. E.
>   coli?  Would one expect a cDNA cloned fragment from C. elegans, which has been
>   6xHis- tagged and expressed in E. coli, to possess rare codons which might
>   cause stalling of the ribosome and premature termination of translation
>   leading to smaller peptides than that of the fusion product?

Codon usage tables for C.elegans and E.coli are at:

ftp://ftp.embl-heidelberg.de/pub/databases/transterm/cel.cod   ( 168 genes)
ftp://ftp.embl-heidelberg.de/pub/databases/transterm/eco.cod   (2513 genes)
ftp://ftp.embl-heidelberg.de/pub/databases/transterm/eco_h.cod ( 250 genes)

(eco_h is highly expressed genes only)

rare E.coli codons (used in under 10% of cases for an amino acid
in E.coli) are:

            Frequency for AA in:
Codon   AA   E.coli  E.coli (high)  C.elegans

Arg     AGG  0.03       0.00         0.05
Arg     AGA  0.04       0.00         0.26
Ile     ATA  0.08       0.00         0.09
End     TAG  0.08       0.02         0.15
Arg     CGG  0.09       0.01         0.06
Arg     CGA  0.06       0.01         0.19
Leu     CTA  0.04       0.01         0.05


other extremely rare codons in eco_h are:

Gly     GGG  0.15       0.04         0.04
Gly     GGA  0.10       0.02         0.69
Ser     AGT  0.14       0.04         0.13
Thr     ACA  0.13       0.04         0.29
Leu     TTG  0.12       0.05         0.22
Leu     TTA  0.12       0.03         0.08
Ser     TCG  0.15       0.07         0.15
Ser     TCA  0.13       0.05         0.24
Leu     CTT  0.10       0.05         0.29
Leu     CTC  0.10       0.08         0.24
Pro     CCC  0.11       0.01         0.06


Certain amino acids are clearly different.

For Arginine, E.coli uses mainly CGT (39%) and CGC (39%)
whereas the CGR and AGR codons only make up the remaining 22%
and C.elegans uses CGT (30%) AGA (26%) CGA (19%) CGC (14%) CGG (6%) AGG (5%).

For Glycine, E.coli uses mainly GGC (40%) and GGT (35%) (even more extreme
for highly expressed genes), and C.elegans uses GGA (69%) or GGT (17%).

There are some other ways to get a short product. For example, finding
a good E.coli promoter inside the expected reading frame. This happens
with the native human cystic fibrosis gene where partial expression
seems to be lethal to the E.coli host. Decent E.coli promoters occur
quite often by chance. I suppose you have an expression vector with
a highly active promoter so this may be an unlikely explanation.

Presumably you can tell these two cases apart. Premature termination would
lose the C terminus. An alternative promoter would lose the N terminus.

Then again, perhaps ribosomal frameshift could cause a jump to a shorter
reading frame. See for example EMBL entry ECRF2X (accno M11520)
reachable through SRS (http://www.sanger.ac.uk/srs/srsc) as URL

http://www.sanger.ac.uk/srs/srsc?[EMBL-id:ECRF2X]+-sf+EMBL

and look through the comments (CC) lines for more details.

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr at sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England



More information about the Celegans mailing list