Improved mass search server at cbrg@inf.ethz.ch

Gaston Gonnet gonnet at inf.ethz.ch
Sat Jan 15 13:04:47 EST 1994


The automatic server at cbrg at inf.ethz.ch has had its MassSearch
funcions significantly revised.  The highlights are:

 o Handling multiple digestions for a single protein.
 o Searching against a peptide or nucleotide database.
 o Allowing modification of amino acids (as when treated with
   reagents).
 o Correct treatment of digestions with CNBr and TrypsinCysModified.
 o An approximate mass of the searched protein can be provided to
   increase the sensitivity of the searching score.
 o Accepts deuterated proteins
 o The speed of the searching has been significantly improved.

For more information send the message (a single line body)

	help MassSearch
(or just "help All")

to	cbrg at inf.ethz.ch

Part of the help file for this improved facility is shown next.

-----------------------------------------------------------------
In some cases, recognition of proteins can be done by fragmenting
the protein according to certain pattern and using the molecular
weights of the fragments as a trace.  This method is not effective
to find the composition of an unknown protein, but it is effective
in locating an unknown sample if its sequence is recorded in a
protein database.

One of the ways of breaking a protein into smaller pieces
according to a certain pattern is by using enzymes which digest
the protein.  For example, trypsin breaks a protein after every
Arginine (R) or after every Lysine (K) not followed by a Proline
(P).  AspN breaks a protein before every Aspartic acid (D).  A table
of recognized enzymes and their cleavage rules is given below.

The molecular weight of fragments can be found experimentally by
mass spectrometry methods to a good level of accuracy.  More
importantly, these methods typically require very small samples in
the order of fractions of pico-moles.

The problem of identifying a sampled protein can be reduced to
digesting the protein with an enzyme, finding the molecular
weights of each of the pieces and then comparing this set of
weights to what would be obtained from the digestion of each
protein in the database.  The process can be repeated with several
different enzymes to increase its selectivity.

The function MassSearch locates the best candidates in a protein
database (SwissProt at this time) that would fit the given weights
once digested by the given enzyme.  The function DNAMassSearch
locates the best candidates in a DNA database (EMBL at this time)
that would encode to a protein that would fit the given weights.

This type of searching has been found particularly useful in the
following circumstances:

o To identify proteins when the amount available is very small,
 for example as can be separated by 2D gels.
o To determine whether an unknown protein is already known in the
 database before spending a significant effort in sequencing.
o To identify more than one protein which cannot be separated by
 other means (this method has been successfully used to identify
 two proteins which were digested together).

Increased precision in the searching is obtained when more than
one digestion is available.  In general it is much better to
perform 2 digestions with different enzymes (with half of the
material and hence at a slightly lower accuracy) than a single
digestion with all the material.  The precision of the retrieval
increases with the number of digestions available.

The template of the body of the message to be sent to
cbrg at inf.ethz.ch is (between but not including the dashed lines):

---------------------------------------------------------------------
MassSearch
Trypsin: 1264.8, 1520.2, 955.9, 2487.0, 1094.1
AspN: 1624.4, 2961.4, 718.8, 716.9, 1890.0
---------------------------------------------------------------------

The token "MassSearch" indicates the operation to be run: a mass
search profile against a protein database.  The following lines
contain the name of the digester enzyme followed by the weights.
The weights can be separated by spaces, commas, tabs or newlines
as convenient, but no other extraneous characters.  Each request
may contain more than one digestion.  These multiple digestions
will be understood to be on the same protein.  So naturally, each
digester name can appear only once.

The output of the above request is:

 Searching on SwissProt version 26.  The sequences are printed in
decreasing order of significance.  Scores lower than 90 are probably
not significant.
For digester Trypsin, the fragment weights were:
         1264.8 1520.2  955.9 2487.0 1094.1
For digester AspN, the fragment weights were:
         1624.4 2961.4  718.8  716.9 1890.0


Score  n k  n k   AC      DE                   OS
143.9  7 5  8 3 P02594;  CALMODULIN.   ELECTROPHORUS ELECTRICUS (ELECTRIC EEL).
143.9  7 5  8 3 P02593;  CALMODULIN.   HOMO SAPIENS (HUMAN), ORYCTOLAGUS
                         CUNICULUS (RABBIT), BOS TAURUS (BOVINE), RATTUS
                         NORVEGICUS (RAT), GALLUS GALLUS (CHICKEN), XENOPUS
                         LAEVIS (AFRICAN CLAWED FROG), ONCORHYNCHUS SP.  (SALMON)                         , AND ARBACIA PUNCTULATA (PUNCTUATE SEA URCHIN).
112.6  7 4  8 2 P21251;  CALMODULIN.   STICHOPUS JAPONICUS (SEA CUCUMBER).
 94.7 21 3 22 2 P07265;  MALTASE (EC 3.2.1.20).   SACCHAROMYCES CARLSBERGENSIS
                         (LAGER BEER YEAST).
 94.2  7 4  8 2 P07181;  CALMODULIN.   DROSOPHILA MELANOGASTER (FRUIT FLY),
                         LOCUSTA MIGRATORIA (MIGRATORY LOCUST), AND APLYSIA
                         CALIFORNICA (CALIFORNIA SEA HARE).
 .  .  .  .  .

Best wishes, Gaston Gonnet, ETH Zurich.



More information about the Bioforum mailing list