In <francis.durst.209.00F22C00 at bota-ulpnospam.u-strasbg.fr> francis.durst at bota-ulpnospam.u-strasbg.fr (Francis Durst) writes:
> In article <3A776EF3.642CDBDC at staff.usyd.edu.au> Bill Blackhall <b.blackhall at staff.usyd.edu.au> writes:
> >From: Bill Blackhall <b.blackhall at staff.usyd.edu.au>
> >Subject: Re: NCBI UniGene files
> >Date: Wed, 31 Jan 2001 12:48:36 +1100
>> >The NCBI files have a file extension of .cgi (what that means, I have no
> >idea). They appear to be simple text files with each EST within them in
> >fasta format. Each EST begins on a new line with the > symbol, then some
> >text, and then the sequence starting on a new line. There is no trace
> >data associated with them. Some of the files contain 100 or more ESTs,
> >so copying and pasting into separate files could get tedious.
>> You may use Seqverter (www.genestudio.com/seqverter.htm) to
> split multi-sequence fasta files into individual files (also fasta).
readseq is an alternative as well, which supports many formats (although I do
not know if it supports Experiment File formats). Gap4 also handles plain text
files (just the sequence - with no header at all), but not directly
fasta. That's a bit of an omission I guess, for what is probably the most
widely used format.
Anyway, as they're just plain text fasta format files I'd still suggest trying
fasta2exp as supplied with the Staden Package.
James
--
James Bonfield (jkb at mrc-lmb.cam.ac.uk) Tel: 01223 402499 Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/