Database search with multiple unordered peptide sequences?
gish at host.nlm.nih.gov
Fri Jun 12 13:14:46 EST 1992
The current versions of the BLAST programs support the use of a gap
character, '-' (hyphen), to separate disjoint segments in either the
FASTA-format database or the FASTA-format query file. This feature was
added to BLASTP in version 1.2.5 (March 1992).
Once you've got access to BLASTP 1.2.5+ and linked the peptides together
with intervening '-' letters in a single file, do the following:
(1) invoke blastp with a very low cutoff score (e.g., S=30).
(2) limit the amount of output to something reasonable using BLASTP's
V=# and B=# options. The value of V is the limit on the number
of one-line descriptions reported at the beginning of the output;
the value of B is the limit on the number of database sequences for
which alignments are reported.
The output from the BLAST programs is sorted by Poisson P-value, which
takes into consideration any multiple hits, without respect to their
order. Ordinarily, for the purposes of determining the N parameter to
the Poisson statistics, it is viewed as a deficiency the fact that the
BLAST programs don't examine the matching segments for consistency in
the same global alignment, but for your purposes the present deficiency
becomes a feature! The probabilities reported, however, should be
discounted because the statistical model is not very accurate for
such short sequences.
Warren Gish phone: (301) 496-2475
Staff Fellow FAX: (301) 480-9241
National Center Internet: gish at ncbi.nlm.nih.gov
for Biotechnology Information
More information about the Bio-soft