Base pair encoding
gilbertd at cricket.bio.indiana.edu
Mon Jul 1 17:25:16 EST 1991
In article <1991Jul1.185311.8785 at jax.org> mrk at jax.org (Michael Kosowsky) writes:
>How do GENBANK and NCBI's GENINFO symbolize uncertain base pairs?
>I've so far learned of three incompatible systems.
>For example, to represent "A or G", Microgenie
>uses 'P', REBASE use 'R', and DNA Inspector uses something
>I naively hope to get away with implementing just one.
Everybody should use the IUB nomenclature now (shouldn't they?).
This is summarized here from the GCG software manual:
GCG uses the letter codes for amino acid codes and nucleotide
ambiguity proposed by IUB (Nomenclature Committee, 1985,
Eur. J. Biochem. 150; 1-5). These codes are compatible with the codes
used by the EMBL, GenBank, and PIR data libraries.
The meaning of each symbol, its complement, and the Cambridge
equivalents are shown below. Cambridge files can be converted into GCG
files and vice versa with the programs FROMSTADEN and TOSTADEN.
IUB/GCG Meaning Complement Staden/Sanger
A A T A
C C G C
G G C G
T/U T A T
M A or C K 5
R A or G Y R
W A or T W 7
S C or G S 8
Y C or T R Y
K G or T M 6
V A or C or G B not supported
H A or C or T D not supported
D A or G or T H not supported
B C or G or T V not supported
X/N G or A or T or C X -/X
. not G or A or T or C . not supported
Don Gilbert gilbert at bio.indiana.edu
biocomputing office, biology dept., indiana univ., bloomington, in 47405
More information about the Bio-soft