most common amino acid?

Bernard Murray, PhD spam at
Mon Nov 22 19:58:29 EST 1999

In article <3839A22F.93BC4E3B at>, "Andrea Beckel-Mitchener, PhD"
<amitch at> wrote:

> Thanks for the calculation!  What exactly is "SwissProt release 36" and
> what is the logic behind normalizing everything to W?
> Andrea
> "Bernard Murray, PhD" wrote:
> > If it is the latter I calculated the relative abundance in all
> > sequences in SwissProt release 36 (relative to W) as follows;

SwissProt is a widely-recognised database of protein sequences

The current release number is 38 (but I haven't had time to
take a fresh database home for my own use).
   Release 36 had 26,840,295 amino acids in 74,019 sequences
and release 38 increases these figures to 29,085,965 and 80,000.

I happened to have the full text of release 36 on my hard disk
and so read in all the sequences and counted up the amino acids.
There are a lot of potential problems with this approach as some
of the SwissProt entries are "theoretical proteins" and some
proteins are represented more than once (eg. cytochrome c from
many organisms).
I could/should have done it for an organism (eg. S.cerevisiae)
in which the sequences off all the ORFs are known.
   Amino acid "W" (tryptophan) is the least abundant so I quoted
all the other residues as abundance relative to this one.  I
thought this was easier to look at than the raw numbers of

Bernard P. Murray, PhD
bpmurray at cgl . ucsf . edu
Department of Cellular & Molecular Pharmacology, UCSF

More information about the Methods mailing list