(none)

David Kristofferson kristoff at genbank.bio.net
Wed Sep 18 20:19:51 EST 1991


>    I have made a blastp search of two viral sequences (enzyme and structural
> protein) against Swiss-prot, PIR and GenBank databases and it worked very well
> except for the GenBank search for which I received the message:  couldn't open
> database file "/blast/db/genbank.head".  I also made the same search with
> FASTA (against Swiss-prot and GenPept/all) and that also worked.

blastp is only for use with protein databanks.  You have to use the
blastn program (with a nucleic acid sequence!) to search GenBank.  I
agree that we could make the output of this error a little more
"user-friendly."


>    The question I asked myself looking at those search results is:  "why is
> the same search algorithm giving the same result for two databases but in
> different order... if it is the same search algorithm wouldn't it be more
> logical to find the same sequences in the same order?"

Do you mean that the "same" sequences in PIR and SWISS-PROT are showing
up in a different order in your BLAST output?  My first question would
be are you sure that the sequences in PIR and SWISS-PROT are
**identical**??  Even though the names may be the same, it sometimes
happens that one database may have a correction or additional sequence
information that didn't get incorporated into the other.  This would
obviously affect the scoring.

>    Comparing FASTA and Blastp searches:  both gave good results for my
> searches.  They found all the other sequences I knew that were homologous.
> Is it possible that Swiss-prot database is more complete than the PIR one?

A common cause for differences between the various databanks is often
simply the release date.  The latest one out obviously has more
information.  Of course, this doesn't explain all differences, but
this reason is frequently overlooked.

> I cannot tell for GenBank since I had either no result (with Blastp) or I
> couldn't interpret the result since it was given in "codes" like ADBCG_20 or
> ACNLKTAC_2 (with FASTA).  Is there a way I can find out what those codes
> correspond to besides from asking the sequences from GenBank?  

It's easier to simply pop all of the LOCUS names into a mail message
and send it to retrieve at genbank.bio.net.  You should be able to cut
and paste this info out of your results into another mail message
unless you are hobbled by use of an old character-based terminal.

> Finally,
> how can I succeed a search against GenBank with Blastp?  (I used:  DATALIB
> genbank)

You can't.  A new program called TBLASTN will let you search a protein
sequence against GenBank, but we don't have that program up yet.  I
see that NCBI has just announced a new release of the BLAST programs.
We will be putting those up in the near future.

				Sincerely,

				Dave Kristofferson
				GenBank Manager

				kristoff at genbank.bio.net




More information about the Bio-soft mailing list