NCBI UniGene files

jkb at jkb at
Tue Jan 30 09:46:31 EST 2001

In <3A764BFE.5217FA63 at> Bill Blackhall <b.blackhall at> writes:

> How can I get Pregap to recognise the individual ESTs inside the cluster
> file that NCBI's UniGene database generates? Sequencher will align them
> all simply by importing the cluster file, but Pregap and Gap only see
> the cluster file as one long sequence rather than picking out the
> individual ESTs. Any help will be welcome.

What format are these cluster files? We don't support any direct
multi-sequence files formats in pregap4 (although we probably ought to) so
regardless of format you'll need to separate them out into multiple files.

There's a fasta2exp program (actually it's an nawk script, so you may need to
edit the first line to change the awk interpreter if it's not in /usr/bin)
that will split the fasta file into multiple experiment files. This doesn't
help matters if you want trace files visible though.

James Bonfield (jkb at   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at

More information about the Staden mailing list