most common amino acid?
Bernard Murray, PhD
spam at 127.0.0.1
Mon Nov 22 19:58:29 EST 1999
In article <3839A22F.93BC4E3B at uiuc.edu>, "Andrea Beckel-Mitchener, PhD"
<amitch at uiuc.edu> wrote:
> Thanks for the calculation! What exactly is "SwissProt release 36" and
> what is the logic behind normalizing everything to W?
> "Bernard Murray, PhD" wrote:
> > If it is the latter I calculated the relative abundance in all
> > sequences in SwissProt release 36 (relative to W) as follows;
SwissProt is a widely-recognised database of protein sequences
The current release number is 38 (but I haven't had time to
take a fresh database home for my own use).
Release 36 had 26,840,295 amino acids in 74,019 sequences
and release 38 increases these figures to 29,085,965 and 80,000.
I happened to have the full text of release 36 on my hard disk
and so read in all the sequences and counted up the amino acids.
There are a lot of potential problems with this approach as some
of the SwissProt entries are "theoretical proteins" and some
proteins are represented more than once (eg. cytochrome c from
I could/should have done it for an organism (eg. S.cerevisiae)
in which the sequences off all the ORFs are known.
Amino acid "W" (tryptophan) is the least abundant so I quoted
all the other residues as abundance relative to this one. I
thought this was easier to look at than the raw numbers of
Bernard P. Murray, PhD
bpmurray at cgl . ucsf . edu
Department of Cellular & Molecular Pharmacology, UCSF
More information about the Methods