From owner-srs@net.bio.net Sun Jun 02 23:00:00 1996
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software,bionet.software.gcg,bionet.software.staden,bionet.software.srs
Subject: Re: Software to extract annotation fields from EMBL/GenBank entries.
Date: 03 Jun 1996 15:49:48 GMT
Organization: The Sanger Centre
Lines: 68
Message-ID: <PMR.96Jun3164948@unst.sanger.ac.uk>
References: <b.robertson-0306961409530001@cb-11.sm.ic.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: b.robertson@ic.ac.uk's message of Mon, 03 Jun 1996 14:09:53 +0100
Xref: biosci bionet.software:15679 bionet.software.gcg:1826 bionet.software.staden:196 bionet.software.srs:260

In article <b.robertson-0306961409530001@cb-11.sm.ic.ac.uk> b.robertson@ic.ac.uk (Brian Robertson) writes:
>   The amount of bacterial genome data available as sequenced cosmids of
>   30-40 kb is increasing rapidly. Our problem is that we need to keep track
>   of newly discovered genes as they appear, so they can be incorporated into
>   our research program as appropriate. For this we need to create lists of
>   probable genes identified in the annotations for each cosmid. This can
>   then be circulated to laboratory workers.
>
>   An example of this kind of annotation is shown below. We would like to
>   extract the "/note" field, which contains the probable function of the
>   gene, and create a list of these for each cosmid.

>FT   CDS_pept        complement(3043..4155)
>FT                   /note="MTCY190.03c, probable anthranilate
>FT                   phosphoribosyltransferase, trpD, len: 370, similar to eg
>FT                   SW:TRPD_LACCA P17170, (43.2% identity in 308 aa overlap),
>FT                   initiation codon uncertain, gtg at 4086 favoured by
>FT                   homology but this has no clear ribosome binding site"

Clearly a job for SRS and ICARUS. Try the bionet.software.srs
newsgroup (this message is crossposted there) ...

But beware - these fields are often describing homologies rather than
confirmed functional assignments (how firm the assignments are depends
from project to project).

Also, different projects will use different methods for formatting
the initial annotation. There is as yet no consensus on how the data
should be presented.

Then again, you also need to keep up with changes to the entries
in case the annotation includes a new homology, or the predicted
gene changes (splice sites in eukaryotes, or the gtg alternative start
codon in the example you used)

>If a shell script is required, can anyone help with writing one? I'm
>afraid it's beyond my capabilities.....

Sadly, it's a little more complicated than that :-)

What SRS can do is extract the features of interest for new/changed
entries, for a given organism, since the last run.

Something like:

% getz "([emnew-org:Mycobacterium tuberculosis] & \
         [emnew-dat#19960500:])" \
         -f 'id acc dat def fts' >! mtcds.may96

... which takes only a second or two to run. You can also do the same
query through any of the SRSWWW servers (it will work on EMNEW or on
GBNEW, updates for EMBL or GenBank).

ICARUS in turn can be used to parse out the feature table fields of
interest, assuming that all entries have some common format (in this
case, all orf names start MTC but you could also look for "/gene=" to
pick other entries).

ICARUS is due sometime soon (in SRS 5). Meanwhile, other languages like Perl
can do the job, although they are a little more complicated to write.

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Mon Jun 03 23:00:00 1996
Path: biosci!bcm.tmc.edu!news.tamu.edu!keck.tamu.edu!ajackson
From: "Andrew J. Jackson" <ajackson@keck.tamu.edu>
Newsgroups: bionet.software.srs
Subject: Installing SRS v4.08 on SunOS
Date: Tue, 4 Jun 1996 09:47:58 -0500
Organization: Texas A&M University, College Station, TX
Lines: 77
Message-ID: <Pine.SUN.3.91.960604094130.776A-100000@keck.tamu.edu>
NNTP-Posting-Host: keck.tamu.edu
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Hi all,

I'm attempting to install SRS v4.08 on a SPARCstation 10 running 
SunOS4.1.3 and decided to use the gcc compiler. It does not finish the 
compiling process ("srsinstall all") because it complains about printf 
being undeclared. Here's the output:

----- Begin Included File -----
press RETURN to continue: 
...updating file prep_srs
enter the make command [make]: 
enter the cc command [cc]: gcc
...inserting commands "make" and "gcc" into srsmake for SunOS
                 *** Welcome to SRS, version 4.08 ***
***  make ODD-Compiler .....
make for SunOS with parameters: odd
make: `/usr/software/srs4_08/bin/sunos/odd' is up to date.
the end
***  make objectbase (section) .....
............
............
.....................................
.....................................
 
________________________________________________________________________________
 
 made section files:
      "SRSSEC:srswin.sec" with 357 blocks 
      "SRSSEC:srswin.ptr" with 80 blocks 
 
 from input: "SRSSDL:srswin.sdl"
 
 defined 1197 objects of 27 different classes
 
_______________________________________________________________________________
 
***  make index builder .....
make for SunOS with parameters: srsbuild
make: `/usr/software/srs4_08/bin/sunos/srsbuild' is up to date.
the end
***  make index checker .....
make for SunOS with parameters: srscheck
make: `/usr/software/srs4_08/bin/sunos/srscheck' is up to date.
the end
***  make header file: srsenv.h .....
***  make getz (retrieval program) .....
make for SunOS with parameters: getz
gcc -Dsun -I/usr/software/srs4_08/bin/sunos  -g -DSRSINCLUDE=\"srswin.h\" -c -o /usr/software/srs4_08/bin/sunos/getz.o  getz.c
getz.c: In function `main':
getz.c:232: `printf' undeclared (first use this function)
getz.c:232: (Each undeclared identifier is reported only once
getz.c:232: for each function it appears in.)
getz.c:236: `printf' used prior to declaration
make: *** [/usr/software/srs4_08/bin/sunos/getz.o] Error 1
the end
 
NEXT STEPS TO DO:
-----------------
 
        source etc/prep_srs
        srscheck
        srsupdate

----- End Included File -----

I've been looking at it too long and am at a point where I need another 
pair of eyes and brains to figure out the obvious.

Thanks!

--
Andrew J. Jackson		W. M. Keck Center for Genome Informatics
ajackson@keck.tamu.edu		Texas A&M Univeristy
Programmer/Analyst		Institute of Biosciences & Technology
				Houston, TX



From owner-srs@net.bio.net Wed Jun 05 23:00:00 1996
Path: biosci!rutgers!uwm.edu!chi-news.cic.net!newsfeed.internetmci.com!news.wwa.com!news.ucdavis.edu!quad!knight
From: knight@quad.cs.ucdavis.edu (James Knight)
Newsgroups: bionet.software,bionet.software.gcg,bionet.software.staden,bionet.software.srs
Subject: Re: Software to extract annotation fields from EMBL/GenBank entries.
Followup-To: bionet.software,bionet.software.gcg,bionet.software.staden,bionet.software.srs
Date: 6 Jun 1996 00:06:07 GMT
Organization: University of California, Davis
Lines: 138
Message-ID: <4p57df$35l@mark.ucdavis.edu>
References: <b.robertson-0306961409530001@cb-11.sm.ic.ac.uk> <PMR.96Jun3164948@unst.sanger.ac.uk>
NNTP-Posting-Host: quad.cs.ucdavis.edu
X-Newsreader: TIN [version 1.2 PL2]
Xref: biosci bionet.software:15715 bionet.software.gcg:1827 bionet.software.staden:197 bionet.software.srs:262

This type of problem is one of the reasons I wrote my SEQIO
package.  Below is a complete C program which will extract
all of the /note fields from the CDS_pept features.  The
program takes a list of files and outputs the list of
entries and the note fields as follows:

For entry embl:MELLP
   late lactation protein precursor
For entry embl:MEMRNAAL
   pot. alpha lactalbumin preprotein
For entry embl:SC9920
   YM9920.01c, unknown, partial, len: 956, CAI: 0.14; PS00061 Short-chain alcohol dehydrogenase family signature
   YM9920.02c, unknown, len: 61, CAI: 0.17, possible small spliced gene
   YM9920.03c, unknown, len: 55, CAI: 0.13, possible small spliced gene
   YM9920.04, unknown, len: 585, CAI: 0.18, putative glutamate decarboxylase gene


It will also take care of splicing the lines together.  It
is, however, a little limited in that it can only handle
EMBL format files, and it only looks for the /note field in
each CDS_pept feature (these are fixed in the code).

For someone with a little programming experience, it should
be simple to modify the program to look for a different feature
or a different sub-field of a feature.

To compile the program, first extract the text from this
message, then ftp my SEQIO package from the following site

  ftp://ftp.cs.ucdavis.edu/pub/strings/seqio.tar.gz

unpack it, and compile the seqio.c and main program together
using a C compiler.  It should compile and run using any
Unix or Windows NT/95 machine (in Windows, run the program
from a DOS shell).

(***Advertisement***)
I wrote the SEQIO package for things just like this, where
someone with a little programming experience needs to do
something that no one has provided software for.  The
package simplifies all of the file and sequence I/O, so
that the programmer can concentrate on the new piece 
of code.  It's use does require C/C++ programming experience,
but, as I hope you can see from the example below, not
that much experience.
(***End of Advert***)

Jim


#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include "seqio.h"


int main(int argc, char *argv[])
{
  int len, i, flag;
  char *entry, *s, *t, *s2, *t2, *feature, *note, *id;
  SEQFILE *sfp;

  for (i=1; i < argc; i++) {
    if ((sfp = seqfopen2(argv[i])) == NULL)
      continue;
   
    if (strcmp(seqfformat(sfp, 0), "EMBL") != 0) {
      fprintf(stderr, "%s:  Not an EMBL file.\n", seqffilename(sfp, 0));
      continue;
    }

    /*
     * Read the entries.
     */
    while ((entry = seqfgetentry(sfp, &len, 0)) != NULL) {
      s = entry;
      flag = 0;
      while ((s = strstr(s, "\nFT   CDS_pept")) != NULL) {
        /*
         * Found a CDS, so print the entry's id, if its the first CDS found.
         */
        if (!flag) {
          if ((id = seqfmainid(sfp, 0)) != NULL ||
              (id = seqfmainacc(sfp, 0)))
            printf("For entry %s\n", id);
          else
            printf("For an unknown entry\n");
          flag = 1;
        }

        /*
         * Find the end of the feature lines for that CDS, and make
         * it NULL-terminated.
         */
        feature = ++s;
        while (*s != '\n') s++;
        while (strncmp(s, "\nFT   ", 6) == 0 && isspace(s[6])) {
          s++;
          while (*s != '\n') s++;
        }
        *s = '\0';

        /*
         * Look for the /note field, then find the strings between
         * the quotes, squeezing out any line breaks.
         */
        if ((t = strstr(feature, "/note=\"")) != NULL) {
          note = t2 = s2 = t + 7;
          while (s2 < s && *s2 != '"') {
            if (*s2 == '\n') {
              s2 += 6;        /* Skip the "\nFT   " and then the spaces */
              while (*s2 != '\n' && isspace(*s2))
                s2++;
              *t2++ = ' ';
            }
            else {
              if (t2 != s2)
                *t2 = *s2;
              t2++;
              s2++;
            }
          }

          *t2 = '\0';
          printf("   %s\n", note);
        }

        *s = '\n';
      }
    }

    seqfclose(sfp);
  }

  return 0;
}


From owner-srs@net.bio.net Tue Jun 18 23:00:00 1996
Path: biosci!internet!biosci!not-for-mail
From: biohelp (BIOSCI Administrator)
Newsgroups: bionet.software.srs
Subject: IMPORTANT - BIOSCI Fundraising Update!
Date: 19 Jun 1996 02:00:19 -0700
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 154
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <199606190900.CAA05708@net.bio.net>
NNTP-Posting-Host: net.bio.net

	    BIOSCI is about halfway to its funding goal!!

I'm interrupting the usual monthly posting of the BIOSCI miniFAQ to
bring you up to date on BIOSCI fundraising progress, a topic of
concern to your future use of this resource.  Thank you in advance for
taking the time to read this message carefully.

Last year we announced that BIOSCI was going to adopt the U.S. Public
Broadcasting System model to fund its operations after our DOE/NSF
grant runs out later this year.  Unlike PBS, we are not soliciting
contributions from users; we are only selling ads on our Web pages
solely to cover our operating costs.  Our goal is to seek sponsorships
until we build up an operating reserve of about $100,000 and then
cease further promotions until we need to build the reserve back up.
(The accountants among our readership will be familiar with the
problem of deferred revenue which we can not safely utilize until ads
have been displayed for a period of time.)  We are only about halfway
to our funding goal and need to raise further funds to avoid having to
curtail services at net.bio.net.  Fundraising is time-consuming,
however, and we need your help as explained further below.

Our operating costs consist of our network connection, phone lines,
hardware maintenance (we will be getting newer and faster hardware
soon!), plus 0.7 FTE of salaries covering UNIX systems admin,
technical support, quality assurance, i.e., testing, of our system,
and administrative costs (such as the time it takes to actually
find/write/call potential sponsors and raise money!).  Although the
BIOSCI staff does get compensated for a portion of the work that they
do, this project has always received a lot of free after-hours and
"vacation" time labor, so we hope that no one will begrudge the time
that we do charge to the project to serve you.  All of the three
part-time staff members, Dave Mack, Julie Lawrence, and myself, have
full time day jobs and families in addition to working hard to keep
this service running for all of you.  Julie and Dave Mack are
subcontractors for BIOSCI; my time that is charged to the project
defrays a portion of my regular salary instead of adding to my income.

Besides having to relocate the project, we were very busy this last
year building new infrastructure such as our WWW hypermail interface
to the system.  This was released last December along with scores of
WAIS indices for the newsgroups.  Virtually everything is complete,
although we do continue to find and fix bugs (many through your
helpful feedback!).  We are still having some problems with our WAIS
indexing.  The archives continue to grow rapidly.  We are running over
100 indexes now versus three previously and any systems crashes cause
greater havoc with the indexing than before!  We are still working to
fix this as fast as our resources permit and appreciate your patience,
but we have been able to automate a lot of the infrastructure to
reduce labor as compared to past requirements.

We have also implemented new software to make moderation of
BIOSCI/bionet newsgroups much easier and combat the growing problem of
Internet junk mail and USENET "spamming."  About 20% of our groups are
now moderated, many of them by the BIOSCI staff!  This, for example,
made a major difference last year in the quality of content in our
EMPLOYMENT/bionet.jobs.offered newsgroup which many commercial
concerns and recruiting firms are using **without charge** to recruit
candidates for positions in the biological sciences.

We are also now in a position to have sponsors for individual
newsgroups as you will have noticed if you have visited
http://www.bio.net/ and clicked on "Access the BIOSCI/bionet
newsgroups" recently.

So, how can you help??
----------------------

As noted above it can take a lot of time to contact potential sponsors
if I have to do it all myself.  Our request is quite simple.  You can
do two important things which will take very little time for you
individually.  

First, please use our WWW system at http://www.bio.net/ to access the
archives.  You can now post or reply to messages via your Web browser.
Your usage helps attract sponsors.  If you contact any of our
sponsors, please be sure to thank them for supporting BIOSCI.  It is
critical for them to get this feedback if they are to continue their
sponsorship for the long term.

Second, if you work for a company or organization that provides
products or services of interest to the biology community, please pass
this message on to your marketing or marketing communications
department or other appropriate group.  Please ask them to help
support BIOSCI by sponsoring our Web site and explain the uses and
benefits of the system to the biology community.  If they are
interested, they can then contact us for further information at our
tech support address, biosci-help@net.bio.net.

Our hope is to quickly raise several large corporate/institutional
sponsors on our heavily-used WWW locations (some stats appended
below), and then end this sponsorship campaign so that our resources
can continue to be used for service provision, not fundraising.  Many
of our specialty newsgroup WWW archives are still used by small
communities of scientists (and they haven't been heavily promoted
yet).  While these may be valuable niche markets to some advertisers,
it will generate more labor and overhead having to find these
sponsors, fairly price the locations, and deal with lots of smaller
sponsorships than fewer mid-to large sponsors.  We are striving to
keep our operation as lean and efficient as possible since we are not
trying to make careers out of running BIOSCI.  We are trying if at all
possible to avoid the administrative overhead entailed with processing
lots of small payments to reach our fundraising goals.

I'd like to thank all of you for your help in advance. In helping us,
you are also helping yourselves, not only in keeping this resource
available for all of the both large and small research communities
that we serve, but also by alleviating the need for us to go back and
compete with researchers for tight grant dollars!  We promised NSF
when we were awarded the BIOSCI grant that we would carry out this
mission to make the service self-supporting.  With your help, we will
succeed in continuing BIOSCI's work into its second decade.  Thank you
very much!

				Sincerely,

				Dave Kristofferson
				BIOSCI/bionet Manager

				biosci-help@net.bio.net


A list of our prime WWW sponsorship locations follow.  Please contact
us for further details.
----------------------------------------------------------------------

The overall BIOSCI WWW pages are currently visited by users from close
to 5500 unique computer hosts per week.  Web servers only log the
Internet computer/host name and frequently more than one individual
can connect to us from a particular host.

Main home page, http://www.bio.net, visited recently by about 2100
unique hosts per week

Main Newsgroups archives page, http://www.bio.net/archives.html,
visited recently by about 1200 Unique hosts per week

BIO-JOURNALS archive page, http://www.bio.net/BIO-JOURNALS.html,
visited recently by about 1000 unique hosts per week.

EMPLOYMENT archive pages: http://www.bio.net:80/hypermail/EMPLOYMENT/ 
and monthly header pages, visited recently by about 800 unique hosts
per week.

Address database search page, http://www.bio.net/addrsearch.html,
visited recently by about 450 unique hosts per week.

Methods newsgroup archive pages, http://www.bio.net:80/hypermail/METHDS-
REAGNTS/ and monthly header pages, visited recently by about 350
unique hosts per week.

Ads can also be displayed on various combinations of other
BIOSCI/bionet newsgroups.  Please contact us at
biosci-help@net.bio.net for details.
----------------------------------------------------------------------

From owner-srs@net.bio.net Sun Jun 23 23:00:00 1996
Path: biosci!daresbury!nntp-trd.UNINETT.no!Norway.EU.net!nntp.uio.no!news.cais.net!newsfeed.internetmci.com!in1.uu.net!news.puc.cl!macal20.facea.puc.cl!user
From: mdin@puc.cl (test)
Newsgroups: bionet.software.srs
Subject: test
Followup-To: bionet.software.srs
Date: 24 Jun 1996 21:23:31 GMT
Organization: nonr
Lines: 1
Distribution: world
Message-ID: <mdin-240696172704@macal20.facea.puc.cl>
NNTP-Posting-Host: macal20.facea.puc.cl

Sorry, I can't get a better place at cyberspace to post that stuff

From owner-srs@net.bio.net Mon Jun 24 23:00:00 1996
Path: biosci!daresbury!nntp-trd.UNINETT.no!Norway.EU.net!EU.net!enews.sgi.com!sgigate.sgi.com!nntp.coast.net!howland.reston.ans.net!vixen.cso.uiuc.edu!usenet.ucs.indiana.edu!sunflower.bio.indiana.edu!gilbertd
From: gilbertd@sunflower.bio.indiana.edu (Don Gilbert)
Newsgroups: bionet.software.srs
Subject: transfac accession query off by one
Date: 25 Jun 1996 01:14:36 GMT
Organization: Biology, Indiana University - Bloomington
Lines: 9
Message-ID: <4qnehs$1nn@usenet.ucs.indiana.edu>
NNTP-Posting-Host: sunflower.bio.indiana.edu

i tried installing transfac data into srs4 today, and
found that accession number queries are off by one
(query for R02509 returns R02508) and the acc num isn't listed
in the report.  this is using default transfac.sdl, and
fiddling w/ that sdl didn't cure anything.  same problems
shows up at srs/ebi & srs/sanger.
- don
-- 
-- d.gilbert--biocomputing--indiana u--bloomington--gilbertd@bio.indiana.edu

From owner-srs@net.bio.net Mon Jun 24 23:00:00 1996
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: Re: transfac accession query off by one
Date: 25 Jun 1996 08:56:10 GMT
Organization: The Sanger Centre
Lines: 32
Message-ID: <PMR.96Jun25095610@unst.sanger.ac.uk>
References: <4qnehs$1nn@usenet.ucs.indiana.edu>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: gilbertd@sunflower.bio.indiana.edu's message of 25 Jun 1996 01:14:36 GMT

In article <4qnehs$1nn@usenet.ucs.indiana.edu> gilbertd@sunflower.bio.indiana.edu (Don Gilbert) writes:
>   i tried installing transfac data into srs4 today, and
>   found that accession number queries are off by one
>   (query for R02509 returns R02508) and the acc num isn't listed
>   in the report.  this is using default transfac.sdl, and
>   fiddling w/ that sdl didn't cure anything.  same problems
>   shows up at srs/ebi & srs/sanger.

Oops. Thought I had caught those ones. Nope, I did it for RHDB and friends
though.

The problem seems to be databases like TFSITE (in this case) where the
flat file has the accession number *before* the ID. This seems to mean that
when the "AC   RH02509" line is reached, the parser still thinks it is
on the previous entry.

The fix is probably to make the "AC" line the ID for the entry, and
do something else with the ID line for now (I used "DF_NAM" for RHMAP).
This makes the accession number show up as the entry name.

Our transfac.sdl file has this change now, and seems to be OK on a
quick test.

No doubt fixed in SRS 5, but for now we have to work around as far as
I can see.
--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division,
E-mail: pmr@sanger.ac.uk             | The Sanger Centre,
Tel: (44) 1223 494967                | Wellcome Trust Genome Campus,
Fax: (44) 1223 494919                | Hinxton, Cambs, CB10 1SA,
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Tue Jun 25 23:00:00 1996
Path: biosci!daresbury!nntp-trd.UNINETT.no!online.no!Norway.EU.net!EU.net!news-res.gsl.net!news.gsl.net!nntp.coast.net!swidir.switch.ch!scsing.switch.ch!news.belwue.de!fu-berlin.de!cs.tu-berlin.de!uni-erlangen.de!gs.dfn.de!immunbio.mpg.de!immunbio.mpg.de!nntp
Newsgroups: bionet.software.srs
Subject: Lookup indices & native SRS
Message-ID: <1996Jun26.121100.235@immunbio.mpg.de>
From: GARTMANN@IMMUNBIO.MPG.DE (Christoph Gartmann)
Date: 26 Jun 96 12:10:59 +0100
Distribution: world
Organization: Max-Planck-Institut fuer Immunbiologie
Nntp-Posting-Host: mpi1.immunbio.mpg.de
X-News-Reader: VMS NEWS 1.24
Lines: 16

Hello,

I think this question has been addressed before but I couldn't find it anymore.
So, is it possible to use the LookUp indices provided by GCG under SRS V4.08?
If so, how?

Regards,
   Christoph Gartmann


+----------------------------------------------------------------------------+
| Max-Planck-Institut fuer      Phone   : +49-761-5108-465   Fax: -221       |
| Immunbiologie                                                              |
| Postfach 1169                 Internet: gartmann@immunbio.mpg.de           |
| D-79011  Freiburg, FRG                                                     |
+----------- Do you know MENUE, the user environment for OpenVMS? -----------+

From owner-srs@net.bio.net Wed Jun 26 23:00:00 1996
Newsgroups: bionet.software.srs
Path: biosci!bcm.tmc.edu!cs.utexas.edu!swrinde!newsfeed.internetmci.com!news2.cais.net!news.cais.net!nntp.uio.no!nntp-oslo.UNINETT.no!nntp-trd.UNINETT.no!daresbury!bioftp.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Subject: This is the patch [Re: Lookup indices & native SRS]
Message-ID: <1996Jun27.191533.2323@comp.bioz.unibas.ch>
Organization: former EMBnet Switzerland [Basel]
X-Newsreader: TIN [version 1.2 PL2]
References: <1996Jun26.121100.235@immunbio.mpg.de>
Date: Thu, 27 Jun 1996 19:15:33 GMT
Lines: 221

Michael Schmitz wrote:
> 
> Christoph Gartmann (GARTMANN@IMMUNBIO.MPG.DE) wrote:
> : Hello,
> 
> : I think this question has been addressed before but I couldn't find it anymore.
> : So, is it possible to use the LookUp indices provided by GCG under SRS V4.08?
> : If so, how?
> 
> As far as I remember, the problem discussed before was the opposite: using
> existing SRS-build indices with Lookup. And that required changes to Lookup ...
> 
>         Michael Schmitz

Hello, 
I don't remember the precise patch either, and as I moved to another site I had to do it again; admittedly I restrict this posting to UNIX by now. There are three problems with srs4_08 in GCG 8.1: 

1) the GCG version is << 4.08
2) it can do only sequence libraries 
3) building in GCG is not too trivial as it uses a complex makefile

The strategy to make it work is as follows. I appreciate any feedback.

Regards
Reinhard 







1)         MAKE NORMAL NATIVE SRS
=================================
Get srs4.08 and treat it as usual. Make sure that getz -libs gets you at least a single "sequence" type. 

% getz -libs
                 Library   Group              Entries    Index Date
-------------------------------------------------------------------
               SWISSPROT   Sequence             52205       4/25/96
                SWISSNEW   Sequence              2819       4/25/96
                     PIR   Sequence             82066       4/25/96
                   NRL3D   Sequence              6063       4/26/96
 ...


2)         MAKE A NEW DIRECTORY AND GET THE GCG STUFF WORKING 
=============================================================
Make a new directory, move there, invoke GCG and GCGSUPPORT, and get gensource:lookup.c as usual: 

% fetch gensource:lookup.c

Fetch copies GCG sequences or data files from the GCG database 
into your directory or displays them on your terminal screen.

 lookup.c


Next, get the makefiles (fetch make*) and change the "name" c example 
in makefile.mm to "lookup". Make the makefile. The next is a crude 
hack and the GCG folks might correct me but this worked in my hands. 

Next, printenv the variable for the executable 
% printenv SRSEXE
Edit the makefile and add the variable you got there (e.g. /sw/srs4_08/bin/irix) and add this to the 'include' section 
_before_ the other includes (this is extremely important as you will 
use GCVG's include otherwise). In my system this might look like 

CFLAGS = -Dirix -Dunix -D__LONGLONG -ansiposix -cckr -common $(DEBUGFLAGS) -I/sw/srs4_08/bin/irix -I$(IDIR)

Do the same with the link library statement and add the need to add libsrs.a from your native SRS installation. 

CEXLIBS = -L/sw/srs4_08/bin/irix/ -lsrs -L$(LDIR) -lgenshare -l$(CURSES) -lm -lmalloc

(remember that this must look different on your system as you will have the srs installation in a different path! Also, do not allow any space between -I and -L and the leading slash of the filename). 

Try it - don't get frustrated as it won't work but you should get something like 

% make lookup
        cc -Dirix -Dunix -D__LONGLONG -ansiposix -cckr -common -DNDEBUG -I/sw/srs4_08/bin/irix/ -I/bio1/gcg/gcg81/gcgsource/include -c lookup.c
        cc  -DNDEBUG lookup.o /bio1/gcg/gcg81/gcgbin/oblib/libapp.a  -L/sw/srs4_08/bin/irix/ -lsrs -L/bio1/gcg/gcg81/gcgbin/oblib -lgenshare -lcurses -lm -lmalloc  -o lookup
ld:
Unresolved:
MsgGetFnct
*** Error code 1 

3.     MODIFY LOOKUP TO RUN ONLY SEQUENCE LIBRARIES 
===================================================

There are only three locations which you need to change. 
% diff lookup.c.patch lookup.c.orig
 
568a571
>     if (!strcmp(g->com, "Sequence")) { 
582c585
< 
---
>     } 
1702a1706,1712
> 
> static int  (*PrintMessage)(MSGo *) = NULL; /* pointer to print function */
> 
> INT4 (*MsgGetFnct (void))()
> {
>   return PrintMessage;
> }

The first is that in line 568 there is a loop which embraces all 
libraries. That's not what we need, as we got the group in the 
previous line the statement 
 for(lcontext = 0; l = LibNextLib(g, &lcontext);
needs only be executed if g->com is "sequence":

  571         for(lcontext = 0; l = LibNextLib(g, &lcontext); ) {
(gdb) p g 
$3 = (struct SRSoGROUP_ *) 0x10266404
(gdb) p *g
$4 = {key = 0x1028033c "S", com = 0x10280330 "Sequence", short_nm = 0x10280340 "SQ", 
  cmnt = 0x10280344 "Search sequence libraries", help = 0x10280360 "srs_me query lib", 
  single_f = 0 '\000', library = {0x10264228, 0x102642c4, 0x102645d0, 0x10264360, 0x102643fc, 0x10264498, 
    0x10264708, 0x10265c5c, 0x10265cf8, 0x0 <repeats 16 times>}, libraryN = 9, df_t = {0x102784ac, 
    0x10278484, 0x102784d4, 0x10278524, 0x1027854c, 0x10278574, 0x1027859c, 0x102785c4, 0x102785ec, 
    0x1027868c, 0x10278664, 0x10278614, 0x1027863c, 0x102784fc, 0x102786dc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 
    0x0, 0x0, 0x0, 0x0}, upd_f = 0 '\000'}

All what we do, therefore, is add the line 

if (!strcmp(g->com, "Sequence")) {

before this line (after the line stating SLBo *l;) and end this 
condition just before the end of the main loop (stating 
  } /* end loop */) with a single }. 

This passage should now look like 

  for(count = 0, gcontext = 0; g = LibNextLibGroup(&gcontext);) {
    int lcontext;
    SLBo *l;

    if (!strcmp(g->com, "Sequence")) {
    for(lcontext = 0; l = LibNextLib(g, &lcontext); ) {

[... some lines deleted deleted for clarity ...]

      if(!access(buf, R_OK))
        activeLib[count++] = l;
    }
   }    

  } /* end loop */

If you want to be really perfect and have more than 24 sequence libraries :-) you need to add a counter that tells the LibNextLib
to stop after 23 readings but you'll miss all beyond this point. 

4.     ADD THE MISSING FUNCTION 
===============================

Last, we need to add the function MsgGetFnct and define PrintMessage 
as this got dropped at least in my 4.08 SRS native version. 
Append the lines 

static int  (*PrintMessage)(MSGo *) = NULL; /* pointer to print function */

INT4 (*MsgGetFnct (void))()
{
  return PrintMessage;
}

to lookup.c, exit the editor, and recompile with "make lookup". 
Try it out: 

% ./lookup

LookUp identifies sequences by name, accession number, author, organism,
keyword, title, reference, feature, definition, length, or date.  The output
is a list of sequences. 

The LookUp program is experimental in this release--please look carefully at
your results. 

 LOOKUP in what sequence libraries:

   a) swissprot
   b) swissnew

...

   m) All libraries
 
   n) quit

 Please choose one or more (* m *): 


5.    RETURN THE BINARY  TO THE USUAL PLACE 
============================================

Last, you need to copy the binary to the usual place 

% mv $GCGUTILDIR/lookup $GCGUTILDIR/lookup.orig
% cp lookup $GCGUTILDIR/lookup

and rescue the source code. 



DISCLAIMER
==========

No warranty for this patch. This is an inofficial fix and
neither myself nor GCG or my employer are responsible for 
this posting or any consequence arising thereof.



[PS: Please don't use this message to reply:to as I don't read my mailbox 
at this address frequently. I've posted from my new address once earlier
and you can get the mail address from there if you need]
-- 
Reinhard Doelz, Basel, Switzerland
 

From owner-srs@net.bio.net Wed Jun 26 23:00:00 1996
Path: biosci!rutgers!sgigate.sgi.com!swrinde!howland.reston.ans.net!surfnet.nl!swsbe6.switch.ch!scsing.switch.ch!news.belwue.de!fu-berlin.de!zrz.TU-Berlin.DE!cs.tu-berlin.de!uni-erlangen.de!gs.dfn.de!immunbio.mpg.de!immunbio.mpg.de!nntp
Newsgroups: bionet.software.srs
Subject: Re: Lookup indices & native SRS
Message-ID: <1996Jun27.174142.237@immunbio.mpg.de>
From: GARTMANN@IMMUNBIO.MPG.DE (Christoph Gartmann)
Date: 27 Jun 96 17:41:42 +0100
References: <1996Jun26.121100.235@immunbio.mpg.de> <PMR.96Jun27101134@unst.sanger.ac.uk>
Distribution: world
Organization: Max-Planck-Institut fuer Immunbiologie
Nntp-Posting-Host: mpi1.immunbio.mpg.de
X-News-Reader: VMS NEWS 1.24
In-Reply-To: pmr@sanger.ac.uk's message of 27 Jun 1996 09:11:34 GMT
Lines: 35

In <PMR.96Jun27101134@unst.sanger.ac.uk> pmr@sanger.ac.uk writes:

> In article <1996Jun26.121100.235@immunbio.mpg.de> GARTMANN@IMMUNBIO.MPG.DE (Christoph Gartmann) writes:
> 
> >I think this question has been addressed before but I couldn't find
> >it anymore.
> >
> >So, is it possible to use the LookUp indices provided by GCG under
> >SRS V4.08?
> 
> Yes, but I don't think you would want to.
> 
> GCG's indices have no links to databases like PROSITE (actually, they
> don't even index prosite).

Ok, so the next question is: as SRS creates separate indices for all the
fields in a database (e.g. EMBL_AUT, EMBL_FTS,...), is it possible to
use these and create links via native SRS? 
 
> A more reasonable question is "can I use my normal SRS 4.08 indices
> with GCG?" to which the answer is "yes, but GCG doesn't make it easy."

At the moment I don't provide LOOKUP to the users. I simply thought I could
save time creating indices.
 
Regards,
   Christoph Gartmann


+----------------------------------------------------------------------------+
| Max-Planck-Institut fuer      Phone   : +49-761-5108-465   Fax: -221       |
| Immunbiologie                                                              |
| Postfach 1169                 Internet: gartmann@immunbio.mpg.de           |
| D-79011  Freiburg, FRG                                                     |
+----------- Do you know MENUE, the user environment for OpenVMS? -----------+

From owner-srs@net.bio.net Wed Jun 26 23:00:00 1996
Path: biosci!bcm.tmc.edu!cs.utexas.edu!howland.reston.ans.net!surfnet.nl!swsbe6.switch.ch!scsing.switch.ch!ubaclu.unibas.ch!nowhere.uucp!schmitz
Newsgroups: bionet.software.srs
Subject: Re: Lookup indices & native SRS
Message-ID: <1996Jun27.112837.46665@yogi.urz.unibas.ch>
From: schmitz@comp.bioz.unibas.ch (Michael Schmitz)
Date: 27 Jun 96 11:28:37 MET
Reply-To: schmitz@bioa.embnet.unibas.ch
References: <1996Jun26.121100.235@immunbio.mpg.de>
Distribution: world
Organization: Biocomputing, Biozentrum Basel
Nntp-Posting-Host: biopcl.embnet.unibas.ch
X-Newsreader: TIN [version 1.2 PL2]
Lines: 11

Christoph Gartmann (GARTMANN@IMMUNBIO.MPG.DE) wrote:
: Hello,

: I think this question has been addressed before but I couldn't find it anymore.
: So, is it possible to use the LookUp indices provided by GCG under SRS V4.08?
: If so, how?

As far as I remember, the problem discussed before was the opposite: using
existing SRS-build indices with Lookup. And that required changes to Lookup ...

	Michael Schmitz

From owner-srs@net.bio.net Wed Jun 26 23:00:00 1996
Path: biosci!bcm.tmc.edu!cs.utexas.edu!howland.reston.ans.net!nntp.coast.net!dispatch.news.demon.net!demon!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: Re: Lookup indices & native SRS
Date: 27 Jun 1996 09:11:34 GMT
Organization: The Sanger Centre
Lines: 40
Distribution: world
Message-ID: <PMR.96Jun27101134@unst.sanger.ac.uk>
References: <1996Jun26.121100.235@immunbio.mpg.de>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: GARTMANN@IMMUNBIO.MPG.DE's message of 26 Jun 96 12:10:59 +0100

In article <1996Jun26.121100.235@immunbio.mpg.de> GARTMANN@IMMUNBIO.MPG.DE (Christoph Gartmann) writes:

>I think this question has been addressed before but I couldn't find
>it anymore.
>
>So, is it possible to use the LookUp indices provided by GCG under
>SRS V4.08?

Yes, but I don't think you would want to.

GCG's indices have no links to databases like PROSITE (actually, they
don't even index prosite).

A more reasonable question is "can I use my normal SRS 4.08 indices
with GCG?" to which the answer is "yes, but GCG doesn't make it easy."

There are two fixes. One posted here some time ago by Reinhard Doelz,
and another which I have hacked together here but not used much (I
tell users to use getz instead of "LookUp" as GCG's LookUp makes no
use of most of SRS anyway.

The basic problem is that with all the usual SRS indices, GCG's menu
blows up - also, GCG assumes that all SRS indexed databases are
sequence ones. So, basically you have to tell LookUp to only
use the sequence databases and ignore the rest.

Oh yes, you also have to watch out for all those duplicated commands
that GCG didn't rename: srsbuild, srscheck, odd

I suggest keeping users away from LookUp, otherwise when SRS 5 and
GCG 9 come along you could have an interesting time fixing it all over
again.

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division,
E-mail: pmr@sanger.ac.uk             | The Sanger Centre,
Tel: (44) 1223 494967                | Wellcome Trust Genome Campus,
Fax: (44) 1223 494919                | Hinxton, Cambs, CB10 1SA,
URL: http://www.sanger.ac.uk/~pmr/   | England

