AAA compositions (sic)

POSTMAST at GUNBRF.bitnet POSTMAST at GUNBRF.bitnet
Wed Nov 20 17:38:00 EST 1991


In message <9111200739.AA07044 at genbank.bio.net> posted to proteins
Michael Clarke asked:
> Can anyone suggest how I might go about determining the amino acid
> composition of a group of related proteins?

In reply to an earlier question posted by him to genbank-bb I had said
> These data are obtainable from the [PIR] PSQ program by using the command
>   USAGE/CURRENT/BRIEF
> after invoking the appropriate database, e.g.
>   PSQ PIR1
> Similar compositional frequencies for selected subsets of sequences can be
> obtained after first using the appropriate FIND command.

The PSQ program has several commands useful for selecting appropriately
related entries.  Once a set of entries is selected the USAGE command above
will produce the composition table.  For example
  FIND HEMOGLOBIN
  USAGE/CURRENT/BRIEF
produces the following composition table for the 341 hemoglobin entries in
the PIR1 database.

Cumulative frequencies from 341 entries 48939 residues
  5584 11.4% Ala A  2196  4.5% Glu E   536  1.1% Met M  1077  2.2% Tyr Y
  1238  2.5% Arg R  3261  6.7% Gly G  2670  5.5% Phe F  4618  9.4% Val V
  1821  3.7% Asn N  2875  5.9% His H  1765  3.6% Pro P    61  0.1% Asx B
  2638  5.4% Asp D  1032  2.1% Ile I  3034  6.2% Ser S    39  0.1% Glx Z
   563  1.2% Cys C  5812 11.9% Leu L  2412  4.9% Thr T
  1220  2.5% Gln Q  3887  7.9% Lys K   600  1.2% Trp W

He had earlier asked for and received composition tables for the PIR and
SWISS-PROT databases.  Of possible interest is the composition table for
the protein sequences from the Brookhaven Protein Data Bank in the PIR
NRL_3D database.

Cumulative frequencies from 1045 entries 177811 residues
 14991  8.4% Ala A  8733  4.9% Glu E  3268  1.8% Met M  6210  3.5% Tyr Y
  6845  3.8% Arg R 14740  8.3% Gly G  6563  3.7% Phe F 12693  7.1% Val V
  8625  4.9% Asn N  4036  2.3% His H  7741  4.4% Pro P    32  0.0% Asx B
  9831  5.5% Asp D  9236  5.2% Ile I 13280  7.5% Ser S    14  0.0% Glx Z
  3607  2.0% Cys C 14209  8.0% Leu L 11325  6.4% Thr T  2809  1.6%  X  X
  6167  3.5% Gln Q 10299  5.8% Lys K  2557  1.4% Trp W

------------------------------------------------------------------------
                                 Dr. John S. Garavelli
                                 Database Coordinator
                                 Protein Identification Resource
                                 National Biomedical Research Foundation
                                 Washington, DC  20007
                                 POSTMASTER at GUNBRF.BITNET



More information about the Proteins mailing list