Search restriction to N-terminal?

Bob MacCallum bob at bsm.bioc.ucl.ac.uk
Thu Nov 28 12:50:16 EST 1996


Harold Drabkin (hdrabkin at mit.edu) wrote:
: Does anyone know of a program that could search the standard sequence
: databases of protein sequences for a particular di or tri peptide
: sequence, restricting the search to just the N-terminal? For example,
: how many proteins begin with V-D, or M-V-D?

if you have fasta libraries and a machine which runs perl, 
save this as nterm.pl and give it a go

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
#!/usr/local/bin/perl

$usage = "usage: nterm.pl fasta_library pattern\n";

# this program reads a *.fasta library file
# and spits out the IDs of sequences beginning
# with the desired pattern

$database = shift || die $usage;
$pattern = shift || die $usage;

open(DB, $database) || die "can't open $database";
while (<DB>)
{
  if (/^\>/)
  {
    $last_id = $_;
    $n=0;
  }
  print $last_id if ($n==1 && /^$pattern/);
  $n++;
}
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

or you could add up how many hits and divide by the total
or whatever...  I'll run a couple of queries if you just want
a quick answer.



--
++++++++++++++++++++++++++++++ Bob MacCallum ++++++++++++++++++++++++++++++
+++++++++++++++ Biomolecular Structure and Modelling Group ++++++++++++++++
++++++++++++ Department of Biochemistry and Molecular Biology +++++++++++++
++++++++++++++++ University College London, WC1E 6BT, UK ++++++++++++++++++





More information about the Bio-soft mailing list