6-frame EST translation

Tim Cutts timc at chiark.greenend.org.uk
Fri Sep 25 09:42:13 EST 1998


This is a followup to my previous post.  It corrects my description of
what the program does, and also fixes a bug.

In article <q3s*r8TGn at news.chiark.greenend.org.uk>,
Tim Cutts <timc at chiark.greenend.org.uk> wrote:

>I wrote a (quite fast) program in C to do precisely this.  It takes a
>nucleic acid database in fasta format as input, and produces six fasta
>format protein databases as output.  It's part of my tpatterns
>package (a sort of findpatterns that translates nucleic acid databases
>on the fly)

Actually, it creates one output database.  Each nucleic acid entry
gets converted into six protein entries, the titles of which have
(Frame 0-5) added.

Translation of the whole of dbEST on a single CPU of our Sun
Enterprise 3000 took 3 min 42 sec.

>Obviously, you need to get dbEST into fasta format first, but you may
>already have that, especially if you use BLAST locally.
>
>I've never officially released tpatterns, but you can download it from:
>
>http://www.bio.cam.ac.uk/~tjrc1/software/tpatterns/tpatterns-1.0.tar.gz

Change that to tpatterns-1.01.tar.gz, or modify the function
initbasebits() in translate.c as follows:

after the lines:

  basebits['A'] = 0;
  basebits['C'] = 1;
  basebits['G'] = 2;
  basebits['T'] = 3;
  basebits['U'] = 3;

add the lines

  basebits['a'] = 0;
  basebits['c'] = 1;
  basebits['g'] = 2;
  basebits['t'] = 3;
  basebits['u'] = 3;

otherwise tpatterns can't cope with lower case nucleotides!

Tim.







More information about the Bio-soft mailing list