Translation of protein sequence to nucleotide sequence
jkb at mrc-lmb.cam.ac.uk
Thu Aug 13 03:26:01 EST 1998
In article <35CEA9D3.AE38C855 at imcb.nus.edu.sg> mcbaet at MCBSGS1.IMCB.NUS.EDU.SG (Anthony Ting) writes:
> Is anyone aware of a program that translate a protein sequence into
>a nucleotide sequence given a specific codon usage table?
The "pip" program (also xpip) in the Staden Package can manage
this. It is option "27 = Back translate to dna". Be sure to type "d27"
rather than just "27", for the addition dialogue questions, or when
using xpip have the "execute with dialogue" button enabled. Eg:
? Menu or option number=m1
0 = List of menus
3 = Read a new sequence
4 = Redefine active region
5 = List a sequence
6 = List a text file
7 = Direct output to disk
8 = Write active region to disk
9 = Edit the sequence
17 = Short sequence search
18 = Compare a sequence
19 = Compare a sequence using a score matrix
27 = Back translate to dna
? Menu or option number=d27
Back translate to dna
? No codon preference (y/n) (y) = n
? Codon table file name=
I've included a copy of the help for the dialogue in this article.
Please also see the staden package web pages for more details (in my .sig).
Help on 'Back translate to dna' (option 27)
This routine back translates protein sequences into DNA using
the standard genetic code. The level of redundancy can be plotted
and the backtranslation saved to a file.
The translation can use either the IUB symbols shown below, or
a set of codon preferences. If a set of codon preferences are used
they must conform to the format of codon tables produced by the
nucleotide analysis program, and the back translation will contain
the favoured codons. If there is no favoured codon the IUB symbols
will be employed. The window length for plotting the redundancy is
The program will plot the redundancy along the sequence and
hence can be used to find the best sequences to use as primers. Note
that the program plots the inverse, and so the higher the plot the
LESS redundant the sequence. For primers look for peaks rather than
The DNA sequence can be saved to a file and analysed using the
nucleotide analysis program. Depending on the application it is
often useful to produce a back translation using both a table of
codon preferences and one using the IUB symbols. This is because the
restriction enzyme search program can distinguish between definite
and possible cuts in the sequence. These matches are what the
program terms "definite matches" and are ones in which the
specification of the recognition sequence corresponds exactly to
that of the back translation. The program will also find what it
terms "possible matches" which are ones that depend on the
particular codons chosen for each amino acid. These are sites at
which recognition sequences could be engineered to produce a cut in
the DNA without changing the amino acid, but which are not
necessarily found in the original sequence.
R (A,R) 'puRine'
Y (T,C) 'pYrimidine'
W (A,T) 'Weak'
S (C,G) 'Strong'
M (A,C) 'aMino'
K (G,T) 'Keto'
H (A,T,C) 'not G'
B (G,C,T) 'not A'
V (G,A,C) 'not T'
D (G,A,T) 'not C'
N (G,A,C,T) 'aNy'
James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/
More information about the Bio-soft