IUBio

[ANNOUNCE] MOWSE database email server

Alan Bleasby ajb at s-crim1.dl.ac.uk
Thu Aug 26 18:48:47 EST 1993


A peptide mass fingerprint email server service is now available
by emailing
     mowse at dl.ac.uk
The help file, available from this address, is reproduced below and
describes the database and how to access it. It allows identification
of known proteins from a set of molecular weights (mass spec) determined
after proteolytic digests.

Alan Bleasby
SERC Daresbury Laboratory



		********************************		
		The MOWSE peptide mass database:
		********************************

		Imperial Cancer Research Fund

			and

		SERC Daresbury Laboratory

        	D.J.C. Pappin, P. Hojrup and A.J. Bleasby
		'Rapid Identification of Proteins by
		Peptide-Mass Fingerprinting'.
       	 	Current Biology (1993), vol 3, 327-332.

		InterNet server version:


Table of Contents:

	[1] Background.

	[2] Construction of the MOWSE database.

		[2.1] Source database.
		[2.2] Calculation of Molecular weight fragments.

	[3] Running database searches via e_mail.

	[4] Example of mail query format.
	
	[5] Results listing.

	[6] Database structure.

		[6.1] MOWSE database structure.
		[6.2] The MW primary fragment molecular weight file.
		[6.3] The MDX file OWL entry index.
		[6.4] The SMW whole sequence molecular weight file.
		[6.5] Program Requirements.
		[6.6] MOWSE Scoring scheme.
		[6.7] Simulation studies.

	[7] General references.

[1] Background:

	Determination of molecular weight has always been an 
important aspect of the characterization of biological molecules. 
Protein molecular weight data, historically obtained by SDS gel 
electrophoresis or gel permeation chromatography, has been used 
establish purity, detect post-translational modification (such as 
phosphorylation or glycosylation) and aid identification. Until 
just over a decade ago, mass spectrometric techniques were typically 
limited to relatively small biomolecules, as proteins and nucleic 
acids were too large and fragile to withstand the harsh physical 
processes required to induce ionization. This began to change with 
the development of 'soft' ionization methods such as fast atom 
bombardment (FAB)[1], electrospray ionisation (ESI) [2,3] and 
matrix-assisted laser desorption ionisation (MALDI)[4], which can 
effect the efficient transition of large macromolecules from 
solution or solid crystalline state into intact, naked molecular 
ions in the gas phase. As an added bonus to the protein chemist, 
sample handling requirements are minimal and the amounts required 
for MS analysis are in the same range, or less, than existing 
analytical methods.
	As well as providing accurate mass information for intact 
proteins, such techniques have been routinely used to produce 
accurate peptide molecular weight 'fingerprint' maps following 
digestion of known proteins with specific proteases. Such maps 
have been used to confirm protein sequences (allowing the 
detection of errors of translation, mutation or insertion), 
characterise post-translational modifications or processing events 
and assign disulphide bonds [5,6]. 
	Less well appreciated, however, is the extent to which such 
peptide mass information can provide a 'fingerprint' signature 
sufficiently discriminating to allow for the unique and rapid 
identification of unknown sample proteins, independent of other 
analytical methods such as protein sequence analysis. 
	The following text describes the construction and use 
of the MOWSE peptide mass database (for MOlecular Weight SEarch) 
at the SERC Daresbury Laboratory. Practical experience has shown 
that sample proteins can be uniquely identified using as few as 3-
4 experimentally determined peptide masses when screened against a 
fragment database derived from over 50,000 proteins. Experimental 
errors of a few Daltons are tolerated by the scoring algorithms, 
permitting the use of inexpensive time-of-flight mass 
spectrometers. As with other types of physical data, such as amino 
acid composition or linear sequence, peptide masses can clearly 
provide a set of determinants sufficiently unique to identify or 
match unknown sample proteins. Peptide mass fingerprints can prove 
as discriminating as linear peptide sequence, but can be obtained 
in a fraction of the time using less material. In many cases, this 
allows for a rapid identification of a sample protein before 
committing to protein sequence analysis. Fragment masses also 
provide structural information, at the protein level, fully 
complementary to large-scale DNA sequencing or mapping projects 
[7,8,9].

[2] Construction of the MOWSE database.

[2.1] Source database.

	MOWSE was created from the OWL non-redundant composite 
protein sequence database [10,11]. The first InterNet release (version 
20.1) contains some 61,000 protein entries, generating approximately
15,000,000 peptide fragments. The MOWSE fragment database will be updated
with each new release of the parent OWL database (every 2 months or so).
                              
[2.2] Calculation of Molecular weight fragments.

	For each entry in the source OWL database, MOWSE derives both 
whole sequence molecular weight and calculated peptide molecular 
weights for complete digests using the range of cleavage reagents 
and rules detailed in Table 1. Cleavage is disallowed if the 
target residue is followed by proline (except for CNBr or Asp N). 
Glu C (S. aureus V8 protease) cleavages are also inhibited if the 
adjacent residue is glutamic acid.  Peptide mass calculations are 
based entirely on the linear sequence and use the average isotopic 
masses of amide-bonded amino acid residues (IUPAC 1987 relative 
atomic masses). To allow for N-terminal hydrogen and C-terminal 
hydroxyl the final calculated molecular weight of a peptide of N 
residues is given by the equation:

	N
	__
	\
	/  Residue mass + 18.0153
	--
	n=1        

	Molecular weights are rounded to the nearest integer value 
before being entered into the database. Cysteine residues are 
calculated as the free thiol, anticipating that samples are 
reduced prior to mass analysis. CNBr fragments are calculated as 
the homoserine lactone form. Information relating to post-
translational modification (phosphorylation, glycosylation etc.) 
is not incorporated into calculation of peptide masses.
 
				
Reagent no.	Reagent			Cleavage rule	
				
	1	Trypsin			C-term to K/R
	2	Lys-C			C-term to K
	3	Arg-C			C-term to R
	4	Asp-N			N-term to D
	5	V8-bicarb		C-term to E
	6	V8-phosph		C-term to E/D
	7	Chymotrypsin		C-term to F/W/Y/L/M
	8	CNBr			C-term to M

	Table 1: Cleavage reagents modelled by MOWSE.


[3] Running database searches by e_mail:

********************************************************************
Search queries should be mailed to mowse at daresbury.ac.uk (short form
mowse at dl.ac.uk). Search results will be returned directly to your
e_mail address. Comments, please, to mbdpn at s-crim1.dl.ac.uk.
********************************************************************

The  'subject'  field  of  your  email  message is irrelevant - all
parameters must be specified in the body of the message.  The relevant
syntax is given below. Some lines are compulsory, others are optional
(see the  description  of parameters section).
All text is case-insensitive, and MOWSE expects integer data. Non-exponential
floating point syntax is acceptable, but MOWSE will round the data to the 
nearest integer. Whitespace is ignored in an intuitive way.

MOWSE  recognises  the  following  command  lines  which  are  further
described below
			Begin
			Reagent
			Tolerance
			SeqMW
			Filter
			Datastart
			Dataend

The order of lines is irrelevant with the exception of 'begin' and the
'datastart/dataend' commands (see below).
If multiple instances of a command occur then only the FIRST instance
will be recognised

Begin
  Every search query MUST start with a 'begin' line. There should only
  be one 'begin' line and all other commands & data should immediately
  follow.

Reagent
  Every search query MUST specify a 'reagent' line. The word 'reagent'
  must be followed by one of the supported cleavage reagents. These are:

         Trypsin
         Lys-C
         Arg-C
         Asp-N
         V8-bicarb
         V8-phosph
         Chymotrypsin
         CNBr

  A typical reagent line is therefore of the form:

     reagent trypsin

Tolerance
  This line is optional. The supplied number specifies the error
  allowed for mass accuracy of experimental mass determination. If no
  figure is  specified, a default tolerance of 2 Daltons will
  be assumed. If you wish to specify a different tolerance then follow
  the word 'tolerance' with the required number of Daltons e.g.

     tolerance 1

  In this case, supplied peptide masses will be matched to +/- 1
  Daltons. Values of 2-4 are suggested for data obtained by laser-
  desorption TOF instruments. Accuracies of +/- 2 Daltons or better are
  generally only possible using an appropriate internal standard (e.g.
  oxidised insulin B chain) with TOF instruments.
  For electrospray or FAB data, a value of 1 can be selected in most 
  cases. If you have real confidence in mass determination, specify '0' 
  (zero) to limit matches to the nearest integer value (effectively +/- 0.5 
  Daltons). Discrimination is significantly improved by the selection of a 
  small error tolerance.

SeqMW
  This optional line allows you to give the molwt of the whole protein (if
  known). This allows you to limit the search to proteins of this molwt
  plus/minus a 'limit' (see below).
  If unspecified, a whole protein molwt of 0 is assumed which MOWSE
  interprets as "search the whole database". This will include all proteins
  up to the maximum size of just under 700,000 Daltons.
  You can specify any molwt in Daltons with this command e.g.

     SeqMW 90000

Filter
  This optional line is used in conjunction with the SeqMW command and
  is meaningless without it. It specifies a percentage. Only proteins
  of the given SeqMW +/- this percentage will be searched. If a SeqMW
  is specified but Filter is unspecified then Filter will default to
  25%. To specify a percentage of 30% use:

     Filter 30

  In this case, a molecular weight of 90,000 Daltons was
  specified and the selection of 30 for the filter restricts the
  search to those proteins with masses from 63,000 to 117,000
  Daltons. A value of 25 is suggested for initial searches, which
  can be progressively widened for subsequent search attempts if no
  match


More information about the Bionews mailing list

Send comments to us at biosci-help [At] net.bio.net