PIR weirdness

Scott Rose rose at galtee.cs.wisc.edu
Mon Aug 2 09:45:42 EST 1993

In article <9308021205.AA26976 at bengal.tigr.org> jckelley at tigr.org (John C. Kelley) writes:

>	We have just noticed a peculiarity in PIR Release 36 that I hope
>someone can explain (and tell me it hasn't been there too long, since I
>haven't noticed it earlier...).  See below:
>                5        10        15        20        25        30
>      1 G D V E K G K K I F(V,Q.K.C.A.Q.C.H.T.C,E.K.G.G.K.H)K V G P
>     31 N L Y G L I G R K T G Q A A G F S Y T D A N K N K G I T W(G.
>     61 E,D,T.L.M.E.Y)L E N P K K Y I P G T K M I F A G I(K.K.K.G.E.
>     91 R.Q)D L I A Y(L.K.S,A,C,S,K)
>This is the sequence from entry A00021.  What I am curious about is
>the portions surrounded by parens whose proteins are seperated by
>commas or periods.  At first thought, this seems to indicate that
>these proteins are uncertain, that it could any one in the list.
>This doesn't bear out however since some commercial releases of
>this entry simply strip the extraneous characters out leaving the
>sequence as is.  THe only other thing I can think is that this is
>pointing out some artifact of this part of the sequence.
>Could someone please enlighten me as to the meaning of this sequence

Perhaps this documentation that I found squirreled away will help.  
No offense to squirrels intended, and I can't say with certainty 
where this came from:

                Punctuation in Protein Sequences

   Two adjacent amino acids, with no punctuation or with a blank
   between, indicates that they are connected, as determined
() Encloses a region, the composition but not the complete
   sequence of which has been determined experimentally, or
   encloses a single residue that has been tentatively
=  Indicates )(, the juxtaposition of two regions of
   indeterminate sequence, while preserving proper spacing
   between amino acids.
/  Indicates that the adjacent amino acids are from different
   peptides, not necessarily connected. When the amino end of a
   protein has not been determined, / precedes the first
   residue. When the carboxyl end has not been determined, /
   follows the last residue. When )/, /(, or )/( are needed,
   only / is used.
.  Outside of parentheses, indicates the ends of sequenced
   fragments. The relative order of these fragments was not
   determined experimentally but is clear from homology or other
   indirect evidence.
.  Within parentheses, indicates that the amino acid to its left
   has been placed with at least 90% confidence by homology with
   known sequences.
,  Indicates that the amino acid to its left could not be
   positioned with confidence by homology.

More information about the Bio-soft mailing list