Database search with multiple unordered peptide sequences?

Warren Gish gish at host.nlm.nih.gov
Fri Jun 12 13:14:46 EST 1992


The current versions of the BLAST programs support the use of a gap
character, '-' (hyphen), to separate disjoint segments in either the
FASTA-format database or the FASTA-format query file.  This feature was
added to BLASTP in version 1.2.5 (March 1992).
 
Once you've got access to BLASTP 1.2.5+ and linked the peptides together
with intervening '-' letters in a single file, do the following:
 
(1) invoke blastp with a very low cutoff score (e.g., S=30).
(2) limit the amount of output to something reasonable using BLASTP's
V=# and B=# options.  The value of V is the limit on the number
of one-line descriptions reported at the beginning of the output;
the value of B is the limit on the number of database sequences for
which alignments are reported.
 
The output from the BLAST programs is sorted by Poisson P-value, which
takes into consideration any multiple hits, without respect to their
order.  Ordinarily, for the purposes of determining the N parameter to
the Poisson statistics, it is viewed as a deficiency the fact that the
BLAST programs don't examine the matching segments for consistency in
the same global alignment, but for your purposes the present deficiency
becomes a feature!  The probabilities reported, however, should be
discounted because the statistical model is not very accurate for
such short sequences.
 
--W
 
-- 
  Warren Gish                           phone:  (301) 496-2475
  Staff Fellow                          FAX:  (301) 480-9241
  National Center                       Internet:  gish at ncbi.nlm.nih.gov
     for Biotechnology Information




More information about the Bio-soft mailing list