Base pair encoding

Don Gilbert gilbertd at cricket.bio.indiana.edu
Mon Jul 1 17:25:16 EST 1991


In article <1991Jul1.185311.8785 at jax.org> mrk at jax.org (Michael Kosowsky) writes:
>
>How do GENBANK and NCBI's GENINFO symbolize uncertain base pairs?
>
>I've so far learned of three incompatible systems.
>For example, to represent "A or G", Microgenie
>uses 'P', REBASE use 'R', and DNA Inspector uses something
>like '(A/G)'.
>
>I naively hope to get away with implementing just one.
>
>
>-- Michael

Everybody should use the IUB nomenclature now (shouldn't they?).
This is summarized here from the GCG software manual:

         GCG uses  the letter  codes  for  amino  acid  codes  and  nucleotide
    ambiguity    proposed    by    IUB    (Nomenclature    Committee,    1985,
    Eur. J. Biochem. 150;  1-5).  These codes are  compatible  with  the codes
    used by the EMBL, GenBank, and PIR data libraries.


                                   NUCLEOTIDES

         The  meaning of  each  symbol,  its  complement,  and  the  Cambridge
    equivalents are  shown below.  Cambridge files can be  converted into  GCG
    files and vice versa with the programs FROMSTADEN and TOSTADEN.

               IUB/GCG      Meaning     Complement   Staden/Sanger

                   A             A             T             A
                   C             C             G             C
                   G             G             C             G
                  T/U            T             A             T
                   M           A or C          K             5
                   R           A or G          Y             R
                   W           A or T          W             7
                   S           C or G          S             8
                   Y           C or T          R             Y
                   K           G or T          M             6
                   V        A or C or G        B       not supported
                   H        A or C or T        D       not supported
                   D        A or G or T        H       not supported
                   B        C or G or T        V       not supported
                  X/N     G or A or T or C     X            -/X
                   .    not G or A or T or C   .       not supported



-- 
Don Gilbert                                     gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405




More information about the Bio-soft mailing list