IUBio

NCBI UniGene files

jkb at mrc-lmb.cam.ac.uk jkb at mrc-lmb.cam.ac.uk
Wed Jan 31 09:26:41 EST 2001


In <francis.durst.209.00F22C00 at bota-ulpnospam.u-strasbg.fr> francis.durst at bota-ulpnospam.u-strasbg.fr (Francis Durst) writes:

> In article <3A776EF3.642CDBDC at staff.usyd.edu.au> Bill Blackhall <b.blackhall at staff.usyd.edu.au> writes:
> >From: Bill Blackhall <b.blackhall at staff.usyd.edu.au>
> >Subject: Re: NCBI UniGene files
> >Date: Wed, 31 Jan 2001 12:48:36 +1100
> 
> >The NCBI files have a file extension of .cgi (what that means, I have no
> >idea). They appear to be simple text files with each EST within them in
> >fasta format. Each EST begins on a new line with the > symbol, then some
> >text, and then the sequence starting on a new line. There is no trace
> >data associated with them. Some of the files contain 100 or more ESTs,
> >so copying and pasting into separate files could get tedious.
> 
> You may use Seqverter (www.genestudio.com/seqverter.htm) to
> split multi-sequence fasta files into individual files (also fasta).

readseq is an alternative as well, which supports many formats (although I do
not know if it supports Experiment File formats). Gap4 also handles plain text 
files (just the sequence - with no header at all), but not directly
fasta. That's a bit of an omission I guess, for what is probably the most
widely used format.

Anyway, as they're just plain text fasta format files I'd still suggest trying 
fasta2exp as supplied with the Staden Package.

James
--
James Bonfield (jkb at mrc-lmb.cam.ac.uk)   Tel: 01223 402499   Fax: 01223 213556
Medical Research Council - Laboratory of Molecular Biology,
Hills Road, Cambridge, CB2 2QH, England.
Also see Staden Package WWW site at http://www.mrc-lmb.cam.ac.uk/pubseq/






More information about the Staden mailing list

Send comments to us at biosci-help [At] net.bio.net