6-frame EST translation
timc at chiark.greenend.org.uk
Fri Sep 25 09:42:13 EST 1998
This is a followup to my previous post. It corrects my description of
what the program does, and also fixes a bug.
In article <q3s*r8TGn at news.chiark.greenend.org.uk>,
Tim Cutts <timc at chiark.greenend.org.uk> wrote:
>I wrote a (quite fast) program in C to do precisely this. It takes a
>nucleic acid database in fasta format as input, and produces six fasta
>format protein databases as output. It's part of my tpatterns
>package (a sort of findpatterns that translates nucleic acid databases
>on the fly)
Actually, it creates one output database. Each nucleic acid entry
gets converted into six protein entries, the titles of which have
(Frame 0-5) added.
Translation of the whole of dbEST on a single CPU of our Sun
Enterprise 3000 took 3 min 42 sec.
>Obviously, you need to get dbEST into fasta format first, but you may
>already have that, especially if you use BLAST locally.
>I've never officially released tpatterns, but you can download it from:
Change that to tpatterns-1.01.tar.gz, or modify the function
initbasebits() in translate.c as follows:
after the lines:
basebits['A'] = 0;
basebits['C'] = 1;
basebits['G'] = 2;
basebits['T'] = 3;
basebits['U'] = 3;
add the lines
basebits['a'] = 0;
basebits['c'] = 1;
basebits['g'] = 2;
basebits['t'] = 3;
basebits['u'] = 3;
otherwise tpatterns can't cope with lower case nucleotides!
More information about the Bio-soft