Translating from nucleic acid to amino acid.

John Powell jip at helix.nih.gov
Thu Mar 2 16:14:59 EST 1995


We have placed a modified version of the GDE translate program (for
translating from nucleic acid to amino acid) on our anonymous FTP site.

The original code has been modified to handle packed fasta formated input
files. We also "purified" the code - no known memory leaks.  The program
processes the input a sequence at a time.

We did a 6-frame translation of gbest - all 86K+ sequences in one pass on
a SPARC2 with 65M swap space with the following command:

zcat gbest.seq.Z |gb2fasta - |translate -tbl 1 -frame 6 - |compress -c > gbest_6f.seq.Z

The program is available via anonymous ftp from milo.dcrt.nih.gov
(128.231.129.60) under pub/translate as the compressed tar file
translate.tar.Z.


The man page for the modfied program follows:
------------------------------------------------------------------------------
TRANSLATE(1)             USER COMMANDS               TRANSLATE(1)


NAME
     translate - translates from nucleic acid to amino acid

SYNOPSIS
         translate [-tbl codon_table] [-frame #]  [-min_frame  #]
         [-3] [-gde] [-noc] infile|-

         translate [-h[elp]]

DESCRIPTION
     Translate program translates  the  selected  sequences  from
     DNA/RNA to Amino Acid.  Translate can be used with sequences
     in either packed FASTA or GDE format. Output is  written  to
     standard output.

     Note that the frame number is appended to the sequence  name
     as ".framenumber".


OPTIONS
     [-tbl codon_table]
                 stop codon table to use:
                      1 = Universal           2 = Mycoplasma
                      3 = Yeast               4 = Vert. mito.
                 Default is Universal.

     [-frame #]  Nucleic acide "frame" to translate:
                      1 = first frame          2 = second frame
                      3 = third frame          6 = all six frames
                 Defaults to all six frames.

     [-min_frame #]
                 minimum open reading frame (i.e. shortest  amino
                 acid sequence to translate). Default is zero (no
                 minimum).

     [-3]        use  triple  letter  codes.  Default  is  single
                 letter codes.

     [-gde]      input sequence is  in  GDE  format.  Default  is
                 FASTA  format. GDE format with '#' or '%' in the
                 first line is not recognized.

     [-noc]      do not include the sequence description/comments
                 from  the  first line of sequence in the output.
                 Useful only with FASTA format.

     infile|-    input sequence can  be  either  a  packed  FASTA
                 sequence  file  or  can  be  taken from standard
                 input (-) through a pipe, or a GDE  format  file
                 with -gde switch.



Keywords: 


-- 
--------
	John Powell 			phone: (301) 496-2963
	Building 12A, Room 2033		FAX: (301) 402-2867
	National Institutes of Health
	Bethesda, MD 20892		Internet: jip at helix.nih.gov




More information about the Bio-soft mailing list