nucleotide sequence searching via the Internet

Curt Ashendel ashendel at aclcb.purdue.edu
Wed Mar 2 14:43:46 EST 1994


On 1 Mar 1994 21:47:30 GMT, 
Kanade Shinkai  <kshinka1 at swarthmore.edu> wrote:

>We have a short invertebrate nucleotide sequence that we wish to run
>against the various nucleotide data bases available on the Internet. 
>We are aware that methods exists for doing this via e-mail (e.g.,
>NCBI-BLAST, FASTA, QUICKSEARCH, etc.) but have no experience doing
>this.  We have downloaded the help files which explain how to format
>search queries.  However, we are not familiar with the theoretical
>bases for these algorithms, and, therefore, do not understand the
>meaning of the various parameters used, and how they relate to more
>straight-forward sequence properties such as homology.
>
>Can anyone recommend how we should procede from here?
>Kanade Shinkai

I know there are references in the help files and there has been a recent 
post with a lit citation that answers this question very directly.  
However, I can recommend an alternative for NCBI-BLAST, if you have the 
time.  I am assuming you want to determine if any database sequence is a 
significant match for any part of your query sequence.  Try sending your 
sequence with a guess for the EXPECT parameter (i.e., 35) and no other 
parameters (except perhaps ALIGNMENTS 20 to prevent overly long responses) 
and see what you get.  At  night I get the  responses for peptide queries 
in less than 5 minutes, usually 2 min.   If you don't get any matches above 
the cutoff score, than increase the EXPECT parameter.  You can look at the 
resulting histogram to tell the frequency hits at each score, the exact 
amout you will need to increase it to get them to be listed depends on the 
length of the query sequence.  The histogram has VERY useful data, as it 
tells you if there is/are outlier(s) with a much higher score(s) than the 
heap of sequences (which are not likely to be significant matches).  If you 
cannot get the hang of this for your sequence by playing with the EXPECT 
parameter, then try it with a similar length sequence for a known gene/cDNA 
(obtain from GENEBANK or the literature).  It is helpful to pick a gene 
that has more than one database entry (other species, other members of the 
gene family, etc).  Conversely, you could make up a sequence at random and 
see how it works.  These exercises  will train you in how to interpret the 
BLAST results with only a few trial runs.  Although this takes some time 
(an hour or so), it will be useful if you intend to do this again.  The 
NCBI load may go through the roof if everyone takes this advice..., so 
please use off-peak-load times (you get faster replies that way also.)  
Also, remember that NCBI rules stipulate that you have no more than one 
query in the queue at a time, so wait for the reply before sending another.

This worked for me, but if this is bad advice, please let me know, but be 
kind.  ;-)
Curt Ashendel
Purdue University
West Lafayette, IN
ashendel at aclcb.purdue.edu



More information about the Methods mailing list