Questions on build Genbank files for Blast search

Andrew Dalke dalke at bioreason.com
Fri Feb 26 18:26:41 EST 1999


gmei at genetics.com asked:
> By looking at NCBI web site and asking some people, it seems that
> could be several options [for doing BLAST searches of the GenBank
> data].

As I remember the process:
1. Get the BLAST executable (I compiled it, but the binaries should
be fine.)
2. Download the *.seq.Z files and uncompress them
3. Run formatdb to convert from FASTA to BLAST 2.x format
4. Run BLAST (I believe NCBI also provides their web front end if you
want that)
5. To update, get their update FASTA formatted files and run fmerge
to combine it with the existing BLAST 2.x files.  You'll want to
start from scratch every month or so since I recall it accumulates
all the sequences, including repeats of the same record.

There are also a range of commercial products which do this from
(at least) Molecular Applications Group, Pangea, Perk-Elmer (once
MII), and, ummm, the Darwin finch company (their name is the Latin
species name for finch).  Each provide different levels of solution.

> 2. Are the steps in option A,B and C correct or am I missing some
> steps there?

  "fmerge" does not download the update files.  You'll need some
sort of mirroring script to get the new data files from NCBI to
your machines.  Then do the fmerge on the new data sets.  You'll
also need to do a "clean" update every month or because some sequences
are changed several times, and fmerge will merge all the different
versions, as I recall.


> 3. Can I run Blast against a file which is in FASTA format or I have
> to convert it to Blast format?

  No.  You must convert it first with formatdb.

						Andrew




More information about the Bio-soft mailing list