From owner-srs@net.bio.net Tue Aug 01 23:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!swrinde!tank.news.pipex.net!pipex!sunsite.doc.ic.ac.uk!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: Database-specific help text
Date: 02 Aug 1995 13:11:51 GMT
Organization: The Sanger Centre
Lines: 34
Message-ID: <PMR.95Aug2141151@unst.sanger.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk

Yet another SRSWWW enhancement (this is fun :-)

One I have long wanted  is to be able to add help text for those strange
extra fields in new databases, or even to override the default help
information for standard fields. For example, to explain whether spaces
are included in indexed organism names or keywords for a particular database.

I have now modified wgetz to do the following:

1. Check for <dbname-FIELDTYPE-fieldname> in wgetz_text.script
   and use this version if found

2. *then* check for the usual <FIELDTYPE-fieldname> in wgetz_text.script
   and use this (these are defined for the standard fields like "Definition"


These definitions allow hyperlinks to show up on the database info page,
pointing to information about how the database has been indexed.
For example (so far I only added SpotName for EC2D as a test):

http://www.sanger.ac.uk/srs/srsc?-info+EC2D

Note for the adventurous:

I have put the new versions of the SRS files on ftp.sanger.ac.uk in
directory pub/pmr/srs, but please use with caution :-)

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Tue Aug 01 23:00:00 1995
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: wgetz_text.script serving
Date: 02 Aug 1995 11:59:26 GMT
Organization: The Sanger Centre
Lines: 24
Message-ID: <PMR.95Aug2125926@unst.sanger.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk

I have been wondering about how best to pick up the full details of new
databases at remote sites.

SRSWWW serves the SDL files, but to install a database you also need
the details for the wgetz_script file.

After a little searching through the source code, I have discovered how
the SDL files are served. It turned out to be very easy to use the same
code to serve some of the other files. I have now updated the SRSWWW server
at the Sanger Centre to serve the wgetz_text.script file on each database
help page. For example:

http://www.sanger.ac.uk/srs/srsc?-info+EC2D

I will try to include these changes in a future release of SRS (when Thure
gets back :-)

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Tue Aug 01 23:00:00 1995
Path: biosci!bcm!cs.utexas.edu!swrinde!tank.news.pipex.net!pipex!sunsite.doc.ic.ac.uk!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: Simplifying hyperlinks
Date: 02 Aug 1995 10:50:38 GMT
Organization: The Sanger Centre
Lines: 55
Message-ID: <PMR.95Aug2115038@unst.sanger.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk

A suggestion for the SRS gurus out there ....

When adding a new SRS database, normally two files must be updated:
"dbname.sdl" and "hyperlink.sdl"

The first file defines the database to be indexed. The second file defines
the hyperlinks for all databases. The help page for each database on the
SRSWWW server links to both files so other sites can pick them up.

The "hyperlink.sdl" files are getting very large - especially for those
sites who are busy adding new databases. It is getting very complicated to
pick up some other site's "hyperlink.sdl" and find which extra lines
are needed to update a local copy.

From conversations with Thure, and some experimenting locally, it appears
possible to forget the hyperlink.sdl file completely, and to put all
those statements into the dbname.sdl file(s).

The main trick is to define a database-specific parser, for example:

#hyperlink /field=@RHDB_DR_FIELD
    /parse=rhdbdr /parser=@INSERTLINK_RH_PARSER

and to define just the fields you need in a small parser in dbname.sdl
rather than use one enormous INSERLINK_PARSER in "hyperlink.sdl" for
everything.

An example of this is in the file "rhdb.sdl" at the Sanger Centre, linked
to the RHDB help page:

http://www.sanger.ac.uk/srs/srsc?-info+RHDB

There are some complications however. The "#linkcall" statements at
present have to be in the individual database file (the call to fetch an
EMBL entry for example would go into "embl.sdl"). This means that (in the
RHDB case) the call to fetch a DBEST entry goes into the "dbest.sdl" file.
On the other hand, if we had no DBEST database locally it would have to go
somewhere else. Most sites should have a lot of these external links
(to GDB for example).

Perhaps these other (external) calls should remain in the hyperlink.sdl
file, so it has some uses.

The "#link" statements in the "dbname.sdl" files already refer to
definitions in other local "dbname.sdl" files, so life is already not
so simple when picking up sdl files.

Comments anyone?
--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Thu Aug 03 23:00:00 1995
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: Hyperlink overdose
Date: 04 Aug 1995 12:01:44 GMT
Organization: The Sanger Centre
Lines: 15
Message-ID: <PMR.95Aug4130144@unst.sanger.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk

I seem to be running into problems with hyperlink.sdl

Now if I add a new link, I have to take an old one away otherwise
srssection gives a segmentation fault.

Removing hyperlink.sdl is looking more and more like a good idea :-)


--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Sun Aug 06 23:00:00 1995
Path: biosci!agate!sunsite.doc.ic.ac.uk!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: Re: Hyperlink overdose
Date: 07 Aug 1995 11:16:24 GMT
Organization: The Sanger Centre
Lines: 25
Message-ID: <PMR.95Aug7121624@unst.sanger.ac.uk>
References: <PMR.95Aug4130144@unst.sanger.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: pmr@sanger.ac.uk's message of 04 Aug 1995 12:01:44 GMT

In article <PMR.95Aug4130144@unst.sanger.ac.uk> pmr@sanger.ac.uk (Peter Rice) writes:
>   I seem to be running into problems with hyperlink.sdl
>
>   Now if I add a new link, I have to take an old one away otherwise
>   srssection gives a segmentation fault.

Problem solved.

hyperlink.sdl (actually all parsers in any sdl file) has a maximum line
length of 100 characters.

The hyperlink.sdl file in the distribution exceeds this limit a little. Our
version of the file exceeded it by quite a lot, just before the point where
I was having trouble.

I have increased the line size (PRSxXBNFLN) in parser.h (to 256) and all
seems to be well.

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Mon Aug 14 23:00:00 1995
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: GCG 8.1 clash with SRS
Date: 15 Aug 1995 15:27:11 GMT
Organization: The Sanger Centre
Lines: 37
Message-ID: <PMR.95Aug15162711@unst.sanger.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk
Xref: biosci bionet.software.gcg:1344 bionet.software.srs:125

GCG 8.1 just arrived, and I have a few questions before I try to
install it...

I see from the System Support manual (pages 43-48) that GCG 8.1 includes
programs with names "srssection" "odd" "srscheck" "srsupdate" and
"srsbuild" which are exactly the same as the names SRS 4 uses.

Has anyone already installed GCG 8.1 at a site where SRS is already running?

Is there any experience with how to run the two together?

What version of SRS is included in GCG 8.1? Some of the documentation
(for example using "MaxIndexSizeKb" on page 45 of the "System Utilities"
manual) looks out of date.

What SDL files does it use? For example, does it include prosite ?
(the System Support manual example seems to only have Swissprot, PIR,
EMBL and GenBank but could just be a truncated list)

Also, the "LookUp" program (actually a front-end to SRS) has a list of
libraries in the program manual example. What happens to this at sites
with other databases, or without some of the listed databases? (for example,
we do not have genbank or "gb_tags")

------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England
--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Tue Aug 15 23:00:00 1995
Path: biosci!agate!howland.reston.ans.net!Germany.EU.net!news.dfn.de!news.embl-heidelberg.de!usenet
From: Thure Etzold <etzold>
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: GCG 8.1 clash with SRS
Date: 16 Aug 1995 12:09:05 GMT
Organization: EMBL Heidelberg
Lines: 38
Distribution: world
Message-ID: <40sn51$dg4@lion.embl-heidelberg.de>
References: <PMR.95Aug15162711@unst.sanger.ac.uk>
NNTP-Posting-Host: phenix.embl-heidelberg.de
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 1.1S (X11; I; IRIX64 6.0.1 IP21)
X-URL: news:PMR.95Aug15162711@unst.sanger.ac.uk
Xref: biosci bionet.software.gcg:1345 bionet.software.srs:126

>
>I see from the System Support manual (pages 43-48) that GCG 8.1 includes
>programs with names "srssection" "odd" "srscheck" "srsupdate" and
>"srsbuild" which are exactly the same as the names SRS 4 uses.

the version of the 'srs' in gcg is 4.05 ...so it is pretty up to date


>
>What SDL files does it use? For example, does it include prosite ?
>(the System Support manual example seems to only have Swissprot, PIR,
>EMBL and GenBank but could just be a truncated list)

the gcg version supports only sequence databases ...prosite is not provided
for

>
>Also, the "LookUp" program (actually a front-end to SRS) has a list of
>libraries in the program manual example. What happens to this at sites
>with other databases, or without some of the listed databases? (for example,
>we do not have genbank or "gb_tags")
>

i think it should be safe to add more sequence databanks to the gcg 
installation...note that only databanks with gcg sequence format are
accepted! ...if you have another SRS installation on your site than the
safest is to keep that site but, of course, the indices from the gcg srs 
installations can be used. The directory with the indices can be
specified for each databank: attribute "indexDir" of the #libenv record
in the file srsdb.sdl

-- 
===============================================================================
Thure Etzold                                   | EMBL
E-mail: etzold@embl-heidelberg.de              | Postfach 10.2209
Tel: (49) 6221 387529                          | 69012 Heidelberg
Fax: (49) 6221 387517                          | Germany


From owner-srs@net.bio.net Tue Aug 15 23:00:00 1995
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: GCG 8.1 clash with SRS
Date: 16 Aug 1995 12:59:56 GMT
Organization: The Sanger Centre
Lines: 59
Distribution: world
Message-ID: <PMR.95Aug16135956@unst.sanger.ac.uk>
References: <PMR.95Aug15162711@unst.sanger.ac.uk> <40sn51$dg4@lion.embl-heidelberg.de>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: Thure Etzold's message of 16 Aug 1995 12:09:05 GMT
Xref: biosci bionet.software.gcg:1346 bionet.software.srs:127

In article <40sn51$dg4@lion.embl-heidelberg.de> Thure Etzold <etzold> writes:
>>I see from the System Support manual (pages 43-48) that GCG 8.1 includes
>>programs with names "srssection" "odd" "srscheck" "srsupdate" and
>>"srsbuild" which are exactly the same as the names SRS 4 uses.
>
>the version of the 'srs' in gcg is 4.05 ...so it is pretty up to date

Hi Thure. You are going to have to solve the same problem when you install
GCG 8.1. How do you plan to handle the multiple versions of srsbuild
and the programs?

>>What SDL files does it use? For example, does it include prosite ?
>>(the System Support manual example seems to only have Swissprot, PIR,
>>EMBL and GenBank but could just be a truncated list)
>
>the gcg version supports only sequence databases ...prosite is not provided
>for

That alone makes it desirable to run with the real SRS indices.

>>Also, the "LookUp" program (actually a front-end to SRS) has a list of
>>libraries in the program manual example. What happens to this at sites
>>with other databases, or without some of the listed databases? (for example,
>>we do not have genbank or "gb_tags")

What does LookUp do that SRS can't? Did they change the output format
somehow?

How closely coupled are GCG and SRS? Can Lookup simply use standard
SRS indices (with SRS prepared), or does it need source editing and a
rebuild of LookUp?

>i think it should be safe to add more sequence databanks to the gcg 
>installation...note that only databanks with gcg sequence format are
>accepted! ...if you have another SRS installation on your site than the
>safest is to keep that site but, of course, the indices from the gcg srs 
>installations can be used. The directory with the indices can be
>specified for each databank: attribute "indexDir" of the #libenv record
>in the file srsdb.sdl

So GCG has to run from a separate index directory if a site has any
database defined in SRS as a "Sequence" database but not in GCG format,
NRSUB at the EBI for example (at the Sanger Centre it just happens to have
moved to "Genome" along with other databases). Or have I misunderstood?

Presumably then GCG's LookUp is picking up "/group=@SEQUENCE_LIBS"
*only* and is not checking the sequence format? Or is it able to skip
databases in EMBL flatfile format (for example)?

This seems to imply needing either a completely separate set of indices
for GCG, or a lot of playing around with symbolic links (on Unix).

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Tue Aug 15 23:00:00 1995
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.drosophila,bionet.software.srs
Subject: Re: Major FlyBase Update
Date: 16 Aug 1995 13:50:34 GMT
Organization: The Sanger Centre
Lines: 43
Distribution: world
Message-ID: <PMR.95Aug16145034@unst.sanger.ac.uk>
References: <9508151547.AA27695@morgan.harvard.edu>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: flybase-help@MORGAN.HARVARD.EDU's message of 15 Aug 1995 08:49:30 -0700
Xref: biosci bionet.drosophila:1309 bionet.software.srs:128

In article <9508151547.AA27695@morgan.harvard.edu> flybase-help@MORGAN.HARVARD.EDU (FlyBase Project Members) writes:
>       I am surprised at your report of finding fewer than 7500 genes when
>   you did the SRS indexing.  Something strange must be going on at your
>   end or ours, suggest you send email to flybase-help@morgan.harvard.edu
>   with any additional details.  (I expect the readers of bionet.drosophila
>   do not wish to see all the technical details, though we might wish
>   to inform them of the end result of pinning down whatever the problem is.)
>   All I can tell you is that in the genes.txt file on IUBio there are 9012
>   lines that begin with "*a" and that in the genes.rpt file there are 9012
>   lines that beging with "Gene symbol".  Appreciate your help if there is
>   any problem in what we have provided.

(note for bionet.software.srs readers - this is part of a thread on
bionet.drosophila about SRS indexing of the flybase genes.txt file)

Success at last. The "missing" genes turned out to be those with a "\"
character in the gene name (Dvir\sev for example).

SRS appeared unable to recognize the "\" character, and truncated all
the names. This left many genes (about 1500) with just the species
code, and SRS only counts the number of different names found.

The solution, which took a bit of head scratching, was:

In writing the flybase.sdl file, the "\" character must be escaped
twice (apparently escaping gets checked twice in parsing),
so the definition of an id for FLYGENE has the sequence "\\\\"
to represent a single "\" character.

Having done that (and included a few extra characters that I was missing
before as valid in gene names) I now indeed have 9012 genes indexed.

Many, many thanks to the flybase community, who gathered round to ask
what was happening rather like, well like fruit flies around a ripe
banana I suppose :-)

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Tue Aug 15 23:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!Germany.EU.net!news.dfn.de!news.embl-heidelberg.de!usenet
From: Thure Etzold <etzold>
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: GCG 8.1 clash with SRS
Date: 16 Aug 1995 15:21:04 GMT
Organization: EMBL Heidelberg
Lines: 68
Distribution: world
Message-ID: <40t2d0$jlm@lion.embl-heidelberg.de>
References: <PMR.95Aug15162711@unst.sanger.ac.uk> <40sn51$dg4@lion.embl-heidelberg.de> <PMR.95Aug16135956@unst.sanger.ac.uk>
NNTP-Posting-Host: phenix.embl-heidelberg.de
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 1.1S (X11; I; IRIX64 6.0.1 IP21)
X-URL: news:PMR.95Aug16135956@unst.sanger.ac.uk
Xref: biosci bionet.software.gcg:1347 bionet.software.srs:130

>Hi Thure. You are going to have to solve the same problem when you install
>GCG 8.1. How do you plan to handle the multiple versions of srsbuild
>and the programs?
>

you could just keep two srs trees. The only things that can be 
shared between the two are the indices and the flatfiles

>>the gcg version supports only sequence databases ...prosite is not provided
>>for
>
>That alone makes it desirable to run with the real SRS indices.

the gcg indices ARE real srs indices!

>
>>>Also, the "LookUp" program (actually a front-end to SRS) has a list of
>>>libraries in the program manual example. What happens to this at sites
>>>with other databases, or without some of the listed databases? (for example,
>>>we do not have genbank or "gb_tags")
>
>What does LookUp do that SRS can't? Did they change the output format
>somehow?

lookup provides almost the same functionality as getz.

>
>How closely coupled are GCG and SRS? Can Lookup simply use standard
>SRS indices (with SRS prepared), or does it need source editing and a
>rebuild of LookUp?

lookup comes with a full set of SRS programs for building indices ...the
main difference is the restriction to sequence databanks in GCG formats
so that that SRS will always be happy with lookup's indices but not
the other way round

>
>
>So GCG has to run from a separate index directory if a site has any
>database defined in SRS as a "Sequence" database but not in GCG format,
>NRSUB at the EBI for example (at the Sanger Centre it just happens to have
>moved to "Genome" along with other databases). Or have I misunderstood?

correct

>
>Presumably then GCG's LookUp is picking up "/group=@SEQUENCE_LIBS"
>*only* and is not checking the sequence format? Or is it able to skip
>databases in EMBL flatfile format (for example)?

no those would have to be removed from the group
>
>This seems to imply needing either a completely separate set of indices
>for GCG, or a lot of playing around with symbolic links (on Unix).
>

as said in the previous posting you can in you SRS installation specify
the index directory from lookup for those databanks that are supported.

Thure

-- 
===============================================================================
Thure Etzold                                   | EMBL
E-mail: etzold@embl-heidelberg.de              | Postfach 10.2209
Tel: (49) 6221 387529                          | 69012 Heidelberg
Fax: (49) 6221 387517                          | Germany


From owner-srs@net.bio.net Tue Aug 15 23:00:00 1995
Path: biosci!daresbury!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.drosophila,bionet.software.srs
Subject: Re: Major FlyBase Update
Date: 16 Aug 1995 14:42:49 GMT
Organization: The Sanger Centre
Lines: 20
Distribution: world
Message-ID: <PMR.95Aug16154249@unst.sanger.ac.uk>
References: <9508151547.AA27695@morgan.harvard.edu> <PMR.95Aug16145034@unst.sanger.ac.uk>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: pmr@sanger.ac.uk's message of 16 Aug 1995 13:50:34 GMT
Xref: biosci bionet.drosophila:1310 bionet.software.srs:129

In article <PMR.95Aug16145034@unst.sanger.ac.uk> pmr@sanger.ac.uk (Peter Rice) writes:
>   Having done that (and included a few extra characters that I was missing
>   before as valid in gene names) I now indeed have 9012 genes indexed.

Actually, a little clarification is needed.

SRS is case-insensitive. There are still some 144 genes with the same name
with just the case changed ("Hup" and "hup" for example) which SRS indexes
as "HUP" so if you look at the total it says 8868.

But they are all there, with the full flybase names. Promise.


--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Path: biosci!rutgers!gatech!howland.reston.ans.net!Germany.EU.net!news.dfn.de!news.embl-heidelberg.de!usenet
From: Thure Etzold <etzold>
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: GCG 8.1 clash with SRS
Date: 17 Aug 1995 11:25:34 GMT
Organization: EMBL Heidelberg
Lines: 26
Distribution: world
Message-ID: <40v8ve$5li@lion.embl-heidelberg.de>
References: <PMR.95Aug15162711@unst.sanger.ac.uk> <40sn51$dg4@lion.embl-heidelberg.de> <40ut6e$erd@hermod.uio.no>
NNTP-Posting-Host: phenix.embl-heidelberg.de
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 1.1S (X11; I; IRIX64 6.0.1 IP21)
X-URL: news:40ut6e$erd@hermod.uio.no
Xref: biosci bionet.software.gcg:1349 bionet.software.srs:132

rodrigol@biotek.uio.no (Rodrigo Lopez) wrote:

>>i think it should be safe to add more sequence databanks to the gcg 
>>installation...note that only databanks with gcg sequence format are
>>accepted! ...if you have another SRS installation on your site than the
>>safest is to keep that site but, of course, the indices from the gcg srs 
>>installations can be used. The directory with the indices can be
>>specified for each databank: attribute "indexDir" of the #libenv record
>>in the file srsdb.sdl
>>
>
>Hmmm....this is not very clear...change the srsdb.sdl in the GCG 8.1
>installation (i.e. $GCGBASEROOT/gcgcore/sdl/srsdb.sdl) or in the
>real SRS (i.e. $SRSSDL) ?

..real SRS!

Thure

-- 
===============================================================================
Thure Etzold                                   | EMBL
E-mail: etzold@embl-heidelberg.de              | Postfach 10.2209
Tel: (49) 6221 387529                          | 69012 Heidelberg
Fax: (49) 6221 387517                          | Germany


From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Path: biosci!daresbury!nntp-trd.UNINETT.no!nntp.uio.no!biotek11
From: rodrigol@biotek.uio.no (Rodrigo Lopez)
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: GCG 8.1 clash with SRS
Date: Thu, 17 Aug 95 14:00:53 GMT
Organization: Norwegian EMBnet node
Lines: 23
Message-ID: <40ut6e$erd@hermod.uio.no>
References: <PMR.95Aug15162711@unst.sanger.ac.uk> <40sn51$dg4@lion.embl-heidelberg.de>
NNTP-Posting-Host: biotek11.uio.no
X-Newsreader: News Xpress Version 1.0 Beta #3
Xref: biosci bionet.software.gcg:1348 bionet.software.srs:131

In article <40sn51$dg4@lion.embl-heidelberg.de>, Thure Etzold <etzold> wrote:

>i think it should be safe to add more sequence databanks to the gcg 
>installation...note that only databanks with gcg sequence format are
>accepted! ...if you have another SRS installation on your site than the
>safest is to keep that site but, of course, the indices from the gcg srs 
>installations can be used. The directory with the indices can be
>specified for each databank: attribute "indexDir" of the #libenv record
>in the file srsdb.sdl
>

Hmmm....this is not very clear...change the srsdb.sdl in the GCG 8.1
installation (i.e. $GCGBASEROOT/gcgcore/sdl/srsdb.sdl) or in the
real SRS (i.e. $SRSSDL) ?

R:)

-------------------------------------------------------
Rodrigo Lopez
The Norwegian EMBnet node
Gaustadalleen 21, 0317 Oslo
http://biomaster.uio.no
-------------------------------------------------------

From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Newsgroups: bionet.software.srs
Path: biosci!daresbury!bioftp.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Subject: fasta file writing 
Message-ID: <1995Aug17.190741.21617@comp.bioz.unibas.ch>
Organization: EMBnet Switzerland [Basel]
X-Newsreader: TIN [version 1.2 PL2]
References: <40vfeu$jb3@usenet.ucs.indiana.edu> <PMR.95Aug17151329@unst.sanger.ac.uk>
Date: Thu, 17 Aug 1995 19:07:41 GMT
Lines: 190

Peter Rice (pmr@sanger.ac.uk) wrote:

: On a related note, I would like to be able to put some extra text on the
: ">" line of FASTA format output - with the aim of using SRS to generate a
: database subset for blast indexing and searching. Then (for example) the
: accession number and definition could appear in the blast search results.

The accession number is a bit more effort than the description itself, 
which is what we've done earlier by modifying the corresponding routines.

You might use SeqWriteFasta in seq.c as a template and call it MyWriteFasta.

INT4 MyWriteFasta (SEQo *seq, char *nam, INT4 (*print)(char*,...))
...

  print (">%s %60.60s\n", nam == NULL ? seq->name : nam, seq->cmnt);
                                                         ^^^^^^^^^^ 
...                                                see seq.h - this is the seq.
                                                   description.

The only problem is that seq.h declares cmnt as a pointer rather an array, 
which is ok in most places but nicer if it is treated like 'nam'. 
In particular,as it treats the cmt as string in the SeqWrite Routine but 
is not populatedi because this is commented out.
I have made it read SEQxXCMNT characters which is OK in all examples I tested. 
We use home-produced GCG format which does not have the LEN: field. The 
stuff we use is rather 'NBRF' format which is constant for the past years. 
Unfortunately, Thure tests for LEN: as he assumes that GCG format is produced 
by GCG tools, and this made our names disappear entirely in the SeqWriteFasta
routine. I changed that, and the cmnt change from pointer to array  
made it necessary to change seq.h , too: 

40,41c40
< /*  char  *cmnt;  */   /* sequence description */
<   char  cmnt[SEQxXCMNT];     /* sequence description */
---
>   char  *cmnt;     /* sequence description */

And, as SeqWrite is static as internal function you must change 
this and declare the prototype non-static in both seq.h and seq.c 
(this is not in the diffs). 
 
in seq.c I had: 

 bioa.embnet.unibas.ch > diff seq.c seq.c.orig
156c156
<     strcpy(tmp->cmnt, "");
---
>     tmp->cmnt = NULL;
191,192c191,192
<     /* if (tmp->cmnt)
<       free(tmp->cmnt); */
---
>     if (tmp->cmnt)
>       free(tmp->cmnt);
911c911,912
<   strncpy ((*seq)->cmnt, file->ln,SEQxXCMNT);
---
>   strcpy ((*seq)->cmnt, file->ln);
>
1327,1330c1328
<   } else { /* RD */
<      sscanf (file->ln, "%s", (*seq)->name);
<    }
<
---
>   }
1340c1338
<   strncpy ((*seq)->cmnt, file->ln,SEQxXCMNT); /* was commented out, copied in again,
 RD */
---
>      /*  strcpy (seq->cmnt, file->ln); */
 bioa.embnet.unibas.ch >



To make it work you could use the following as example. 
I use the code as listed below, and have made SeqWrite a non- static function 
by adding it to seq.h and commenting 'static' in seq.c properly. 
This example writes all sequences of swissprot with a particular keyword
in the desired fasta format.


 bioz.embnet.unibas.ch >  ./test | more
166 entries written to set "q"
>ACTB_DICDI  CALCIUM-REGULATED ACTIN BUNDLING PROTEIN.
MAETKVAPNLTGIEQTKAGQSFTEKLSAEAMEFFCNVAKLPFSQQAVHFL
NAYWAEVSKEAEFIYSVGWETIKYADMHCKGIQLVFKYDEGNDLDFDIAL
YFYEQLCKFCEDPKNKNYATTYPISQPQMLTALKRKQELREKVDVNFDGR
VSFLEYLLYQYKDFANPADFCTRSMNHDEHPEIKKARLALEEVNKRIRAY
EEEKARLTEESKIPGVKGLGATNMLAQIDSGPLKEQLNFALISAEAAVRT
ASKKYGGAAYSGGAGDAGAGSSAGAIWWMNRDLEEKKKRYGPQKK
>ARP_EUGGR   CALCIUM-BINDING ACIDIC-REPEAT PROTEIN PRECURSOR (ARP).
MSHLWCWLFLVLCLACLVLSIEAKDSDGDGLLDVDEINVYFTDPYNADSD
QDGLTDGLEVNRHQTHPQDKDTDDDSIGDGVEVNNLGTNPKDPDSDDDGL
TDGAEVNLYRTDPLDADSTTTGCPMGGGAEVRHRPQNGDTDDDGLTDGAE
VNVHRTNPQDGDSDDDGLSDGAEVNTYHSNPKDGDSDDDGVSDGAEVNPK
LKDSDGDGLTDEEEIKLYRTDPFCADSDFDGLLDGEEVKVHKTNPLDGDS
DDDGLGDGAEVTHFNTNPLDADSDNDGLDDGEEINVHGTDPEDPDSDNDG
LNDGDEVNVYNTDPEEDDSDEDGVCDGAEVNVHHTNPKDEDSDNDGIPDG
AEINTHKTDPNDEDSDDDGIADGAEVTLTDSDGDGLPDEDEVALYNTNPA
NADSDYDGLTDGAEVKRYQSNPLDKDTDDDGLGDGVEVTVGTDPHDATVT
TTGSRTAVEINVHGSDPNDEDTDDDGLTDGAEVNLHRTDPEDADTDDDGL
TDGAEVNTYR
... 


etc. The source is: 


#include <stdio.h>
#include "srs.h"


INT4 MyWriteFasta (SEQo *seq, char *nam, INT4 (*print)(char*,...))
{
  INT4 lncnt = 0;

  if (!print)
    print = (INT4 (*)(char*,...)) printf;

  if (!seq) return 0;
  print (">%s %60.60s\n", nam == NULL ? seq->name : nam, seq->cmnt);
  lncnt = SeqWrite (seq->seq, seq->len, 50, 50, 0, print); 
  return lncnt+1; 
       
}


int main ()
 {
   SETo     *set;
   IDoENTRY id;
   ENTRYo   *entry;
   int      n, entryN;
   SEQo      *seq=NULL;
   
   SrsEnv ();
   LibOpen ("srswin");
   
   ParDefStr ("fieldList", "ID");
   ParDefStr ("fieldList", "Definition");
   
   if (QryDo ("[swissprot-definition:calcium*]","Q")) {
     set = SetGet ("Q"); 
     entryN = SetSize ("Q");
     
     for (n=1;  n <= entryN; n++) {
       SetGetID (set, n, &id); 
       entry = EntryOpen (&id);
       SlbGetSequence (entry, &seq);
       MyWriteFasta (seq, NULL, (int (*)(char*,...)) printf); 
       SeqClr (&seq);
       
       EntryClose (&entry);
     }
   }
 }
     

The makefile I used was 
 bioz.embnet.unibas.ch > cat Makefile

CC = cc  -g
EXE = /biox6/srs/SRS/srs4_06/bin/osf
SRC = /biox6/srs/SRS/srs4_06/src
INCLUDE = -I$(SRC) -I$(SRSEXE)
LIBS =
LOADFLAGS = $(LIBS) $(SRSEXE)/libsrs.a

default: test2

test: test.c
        $(CC) test.c $(INCLUDE) $(LOADFLAGS) -o test




Maybe this helps.
Regards
Reinhard Doelz
BioComputing Basel


 
-- 
 R.Doelz         Klingelbergstr.70| Tel. x41 61 267 2247  Fax x41 61 267 2078|
 Biocomputing        CH 4056 Basel| electronic Mail    doelz@ubaclu.unibas.ch|
 Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info@ch.embnet.org</a> 

From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Path: biosci!bcm!cs.utexas.edu!news.sprintlink.net!howland.reston.ans.net!vixen.cso.uiuc.edu!usenet.ucs.indiana.edu!sunflower.bio.indiana.edu!gilbertd
From: gilbertd@sunflower.bio.indiana.edu (Don Gilbert)
Newsgroups: bionet.software.srs
Subject: Re: Suggestions for SRS use w/ nonsequence gene data
Date: 17 Aug 1995 17:39:08 GMT
Organization: Biology, Indiana University - Bloomington
Lines: 9
Message-ID: <40vurs$5mi@usenet.ucs.indiana.edu>
References: <40vfeu$jb3@usenet.ucs.indiana.edu> <40vm24$bns@lion.embl-heidelberg.de>
NNTP-Posting-Host: sunflower.bio.indiana.edu

Thure,

That sounds like it may answer most of my needs.  I don't have
a need for case-sensitive indexing at this point.

Thanks much, Don

-- 
-- d.gilbert--biocomputing--indiana u--bloomington--gilbertd@bio.indiana.edu

From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Path: biosci!bcm!cs.utexas.edu!howland.reston.ans.net!xlink.net!news.dfn.de!news.embl-heidelberg.de!usenet
From: Thure Etzold <etzold>
Newsgroups: bionet.software.srs
Subject: Re: Suggestions for SRS use w/ nonsequence gene data
Date: 17 Aug 1995 15:08:52 GMT
Organization: EMBL Heidelberg
Lines: 137
Distribution: world
Message-ID: <40vm24$bns@lion.embl-heidelberg.de>
References: <40vfeu$jb3@usenet.ucs.indiana.edu>
NNTP-Posting-Host: phenix.embl-heidelberg.de
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 1.1S (X11; I; IRIX64 6.0.1 IP21)
X-URL: news:40vfeu$jb3@usenet.ucs.indiana.edu

Hi Don,

most of the changes you suggest will be covered by the new parser which 
we currently are integrating into SRS ...it will be almost a revolution;-)
The new parser is called Icarus (Interpreter for Commands And RecUrsive Syntax) 

>
>Here are some suggestions for SRS that have arisen from trying to use
>it with Drosophila genome data:
>
>1) indexer interface: needs to permit indexing of any character/symbol set.
>   Drosophila genes use just about the full ASCII printable symbol set
>   (and would use more if possible).

this is no problem for the indices ...any character is allowed ...but
alphabetical characters are converted to lowercase - is that a problem?
to have case conversion simplifies matters greatly but i can make this
optional. Some characters can be a problem for the parser ...i think peter
rice figured out how to specify a '/' ('////'?) 
..tried this with the new parser ...the '/' is ok there

The other problem is that certain characters have a meaning in the SRS query
language, eg, "&|!" are logical operators - i will allow quoting the search
words so that these characters don't interfere with the SRS syntax. 

>   
>   It would be nice to allow also adding data filter functions to the indexer
>   that would convert various computer format data to data suitable for
>   indexing -- e.g., convert special codes like '&bgr;-tubulin' into english 
>   equivalents 'beta-tubulin' for the indexing.
>   

Icarus has an easy interface to call and integrate C functions 
so you can do the following:

/&..;/ <rep s:decode(:$ct)>;

starts with a regular expression for eg '&br;' - the command 'rep'
inside '<' and '>' replaces the current token ($ct - the match of the regular
expression) by the string (argument 's') that is returned by function 'decode'
which receives the current token ($ct) as argument. Decode can be your 
C -function which is declared somewhere in the syntax.

>   It would help if data parsing language for indexer was not
>   as difficult to write accurately.  If I don't spend a lot of
>   time testing a new parsing, I can't be confident that the indexer
>   is getting everything it should.  The recent example of missing
>   2,000 of 9,000 entries in the flygene data due to the symbol "\" in 
>   gene names makes this point.

yes that has been a problem ...testing if icarus will be MUCH easier ...you
can insert print statements anywhere or call a trace option - also the syntax
can be put into a single file so that the recompilation of all .sdl files
is not required after every change.

>   
>   
>2) query interface: needs to permit any character/symbol set to be valid
>   data in the query, and query symbols should be configurable. 
>    
>   Use of words instead of symbols as query operators
>   should be optional at least, and by my preference they would be default.
>   E.g., a query like this should be possible:  
>   
>      databank1  fieldA  some/*![-]()=+messy&^%!%@*#string  
>      and
>      databank2  fieldB  another%^#*&@P)!Q(@string
>      but not
>      databank3  fieldC  more#*@(#P*#strings

ok good idea ...of course the quoting of search words should help but,
yes, it could be a nice idea to let people define there own symbols or
words for the operators

>      
>
>3) output interface:  needs to allow addition of post-processor functions
>   to convert data to various human-usable formats.  This is done now in
>   part for sequence data and for adding html links, but not in a
>   general way that would allow addition output formatting per database
>   w/o rewriting the basic SRS code.

This is what where we cracked our heads with Icarus ...we have added the
concept of "tasks" which let you specify things to be done only if 
a certain task is selected ...this works out beautifully for, eg, adding
hypertext links (this means that the file hyperlink.sdl will go!)

>
>   Here is roughly how I did it for flybase data, but it is a hack
>   not a general solution.  Example outputs show this formatted output 
>   from iubio server, versus the computer "star code" output from the 
>   sanger server.
>
>
here is some Icarus code that does the conversion of the first line of
your example and converts

>*a &bgr;Tub97EF

into 

Gene symbol                  : betaTub97EF

the syntax for the original line is:

gene-sym <et wrt> = '*a /[^\n]+/;

"et" means that before parsing a toketable with the name "gene-sym" is
created. "wrt" means that the entire line is written as a token into that
table. This token is essentially the whole data-field to be used for
printing, extracting indexable keys ...and so on - and you can use it as input
for the conversion:

display-gene-sym <in:gene-sym task:display>
       '*a <p:"Gene symbol          : "> /[\n ]+/ <p:decode(:$ct)>;

this requests the gene-sym token as input and is executed only if
the task "display" is set - 'p' stand for "print". the regular expression
between the '/' describes the gene symbol itself which is the submitted
to the "decode" function and printed.

The integration of Icarus will still take some time but we hope to finish
it in ~2-3 months. The problem is of course, as you said, that it is very
tempting to move things that happen now inside C-code into Icarus!

regards
thure



-- 
===============================================================================
Thure Etzold                                   | EMBL
E-mail: etzold@embl-heidelberg.de              | Postfach 10.2209
Tel: (49) 6221 387529                          | 69012 Heidelberg
Fax: (49) 6221 387517                          | Germany


From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Path: biosci!bcm!news.msfc.nasa.gov!newsfeed.internetmci.com!gatech!howland.reston.ans.net!tank.news.pipex.net!pipex!sunsite.doc.ic.ac.uk!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk (Peter Rice)
Newsgroups: bionet.software.srs
Subject: Re: Suggestions for SRS use w/ nonsequence gene data
Date: 17 Aug 1995 14:13:28 GMT
Organization: The Sanger Centre
Lines: 50
Message-ID: <PMR.95Aug17151329@unst.sanger.ac.uk>
References: <40vfeu$jb3@usenet.ucs.indiana.edu>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: gilbertd@sunflower.bio.indiana.edu's message of 17 Aug 1995 13:16:14 GMT

In article <40vfeu$jb3@usenet.ucs.indiana.edu> gilbertd@sunflower.bio.indiana.edu (Don Gilbert) writes:
>   Here are some suggestions for SRS that have arisen from trying to use
>   it with Drosophila genome data:
>
>   1) indexer interface: needs to permit indexing of any character/symbol set.
>      Drosophila genes use just about the full ASCII printable symbol set
>      (and would use more if possible).

One for Thure, but I suspect your wish (and mine) will soon be granted :-)

>   2) query interface: needs to permit any character/symbol set to be valid
>      data in the query, and query symbols should be configurable. 
>
>      Use of words instead of symbols as query operators
>      should be optional at least, and by my preference they would be default.

I would be happy to just have escaping of critical characters like \(\)\&\|
but certainly something is needed.

>   3) output interface:  needs to allow addition of post-processor functions
>      to convert data to various human-usable formats.  This is done now in
>      part for sequence data and for adding html links, but not in a
>      general way that would allow addition output formatting per database
>      w/o rewriting the basic SRS code.
>
>      Here is roughly how I did it for flybase data, but it is a hack
>      not a general solution.  Example outputs show this formatted output 
>      from iubio server, versus the computer "star code" output from the 
>      sanger server.

This looks a really neat idea !!

Not only is it nice in the flybase cases, but it could also save me the
problem of converting many tab-delimited databases with perl scripts
into something with readable "entries" that I can parse and display.

Of course, first I need a parser that can handle tab-delimited databases :-)

On a related note, I would like to be able to put some extra text on the
">" line of FASTA format output - with the aim of using SRS to generate a
database subset for blast indexing and searching. Then (for example) the
accession number and definition could appear in the blast search results.

--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division
E-mail: pmr@sanger.ac.uk             | The Sanger Centre
Tel: (44) 1223 494967                | Hinxton Hall, Hinxton,
Fax: (44) 1223 494919                | Cambs, CB10 1RQ
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Path: biosci!daresbury!nntp-trd.UNINETT.no!Norway.EU.net!EU.net!howland.reston.ans.net!vixen.cso.uiuc.edu!usenet.ucs.indiana.edu!sunflower!gilbertd
From: gilbertd@sunflower.bio.indiana.edu (Don Gilbert)
Newsgroups: bionet.software.srs
Subject: Suggestions for SRS use w/ nonsequence gene data
Date: 17 Aug 1995 13:16:14 GMT
Organization: Biology, Indiana University - Bloomington
Lines: 231
Message-ID: <40vfeu$jb3@usenet.ucs.indiana.edu>
NNTP-Posting-Host: sunflower.bio.indiana.edu
X-Newsreader: TIN [version 1.2 PL2]


Here are some suggestions for SRS that have arisen from trying to use
it with Drosophila genome data:

1) indexer interface: needs to permit indexing of any character/symbol set.
   Drosophila genes use just about the full ASCII printable symbol set
   (and would use more if possible).
   
   It would be nice to allow also adding data filter functions to the indexer
   that would convert various computer format data to data suitable for
   indexing -- e.g., convert special codes like '&bgr;-tubulin' into english 
   equivalents 'beta-tubulin' for the indexing.
   
   It would help if data parsing language for indexer was not
   as difficult to write accurately.  If I don't spend a lot of
   time testing a new parsing, I can't be confident that the indexer
   is getting everything it should.  The recent example of missing
   2,000 of 9,000 entries in the flygene data due to the symbol "\" in 
   gene names makes this point.
   
   
2) query interface: needs to permit any character/symbol set to be valid
   data in the query, and query symbols should be configurable. 
    
   Use of words instead of symbols as query operators
   should be optional at least, and by my preference they would be default.
   E.g., a query like this should be possible:  
   
      databank1  fieldA  some/*![-]()=+messy&^%!%@*#string  
      and
      databank2  fieldB  another%^#*&@P)!Q(@string
      but not
      databank3  fieldC  more#*@(#P*#strings
      

3) output interface:  needs to allow addition of post-processor functions
   to convert data to various human-usable formats.  This is done now in
   part for sequence data and for adding html links, but not in a
   general way that would allow addition output formatting per database
   w/o rewriting the basic SRS code.

   Here is roughly how I did it for flybase data, but it is a hack
   not a general solution.  Example outputs show this formatted output 
   from iubio server, versus the computer "star code" output from the 
   sanger server.


lynx 'http://iubio.bio.indiana.edu:81/srs/srsc?[FLYGENE-acc:FBgn0003890]'

Gene symbol                  : betaTub97EF
Last update                  : 11 Jul 95
Synonym(s)                   : beta-Tub97EF
                             : beta1t
                             : beta4t
                             : betaTub4
                             : B4t
FlyBase gene id number       : FBgn0003890
Full name                    : betaTubulin97EF
Genetic map position         : 3-[92]
Cytological map position     :
    Located in 97E-F by in situ hybridization (Natzle and McCarthy,
    1984).
Function(s) of product       : beta-tubulin
                             : tubulin
D. mel. DNA/RNA AC no(s)     : X69560
                             : M20419
Phenotypic information       :
    Tubulins are the main
    structural components of microtubules in mitotic and
    meiotic spindles, cilia, flagella, neural processes
    and the cytoskeleton; nontubulin proteins (MAPS or
    microtubule-associated proteins) are involved along
    with tubulins in the formation of specialized
    microtubules (Theurkauf, Baum, Bo and Wensink, 1986; Rudolph,
    Kimble, Hoyle, Subler and Raff, 1987).
    Tubulin proteins are found in a wide variety of
    species from unicellular organisms to man; their
    biochemical and molecular structure is highly
    conserved. The alpha- and beta-subunits from different
 ...
==============================

lynx 'http://www.sanger.ac.uk/srs/srsc?[FLYGENE-acc:FBgn0003890]'

*a &bgr;Tub97EF
*H Last updated 11 Jul 95
*i &bgr;-Tub97EF
*i &bgr;1t
*i &bgr;4t
*i &bgr;Tub4
*i B4t
*z FBgn0003890
*e &bgr;Tubulin97EF
*b 3-[92]
*c Located in 97E-F by in situ hybridization (Natzle and McCarthy, 1984).
*d &bgr;-tubulin
*d tubulin
*g X69560
*g M20419
*p Tubulins are the main
*p structural components of microtubules in mitotic and
*p meiotic spindles, cilia, flagella, neural processes
*p and the cytoskeleton; nontubulin proteins (MAPS or
*p microtubule-associated proteins) are involved along
*p with tubulins in the formation of specialized
*p microtubules (Theurkauf, Baum, Bo and Wensink, 1986; Rudolph,
*p Kimble, Hoyle, Subler and Raff, 1987).
*p Tubulin proteins are found in a wide variety of
*p species from unicellular organisms to man; their
*p biochemical and molecular structure is highly
*p conserved. The &agr;- and &bgr;-subunits from different
...
==========================



Changes needed to add output formatting function for a given
database, in srs/src/srswww.c -------------


Boolean gDoflyb = FALSE; /* dgg */
enum { kPlainText = 3 };
char* fbcode2report( char* inbuf, short outformat, short state);

/* fbcode2report code is from fbgenereport.c available in 
  portable flybase server source at
  http://flybase.bio.indiana.edu:82/1/work/Portable-server/source/flyreports/
*/
      
INT4 WwwPrintSet (char *setName, int firstN, int printN, SCRIPTo *script)
{
...
  for (k=firstN;  k <= lastN && k <= setEntryN;  k++) {
    SetGetID (set, k, &id); 
    entry = EntryOpen (&id);
    if (entry) { 
      if (ParGetNum ("printLinkTable"))
        WwwPrintEntryLinks (entry, script);
      else {
/* --- add for output formatting - dgg --- */    
        /* dgg hack to test formatting calls */
        gDoflyb= (0== strcmp(id.id_d->nam, "FlybEntry-ID"));
/* ^^^ add for output formatting - dgg ^^^ */    
        WwwEntryPrint (entry, script);
        }
      EntryClose (&entry);
      }
}


static void FlybPrintBuff (char *ln)
{
  char* newbuf= fbcode2report( ln, kPlainText, 1);  
  if (newbuf) {
    strcat (gBuff, newbuf);
    free(newbuf);
    }
}

static void WwwPrintField (ENTRYo *entry, INT4 doPrintAll)
{
  static PRSoST *tokList=NULL;
  LIBoHYPERLINK *hLink;
  INT4           context, (*printSave)(), lineNSave;

#ifdef MAC
  if (gBuff == NULL) gBuff = (char*) malloc( (WWWxMAXLINESIZE+1) * sizeof(char));
#endif
  
  if (!tokList)
    PrsIniSym (&tokList, 50, 500);
  else
    PrsResetSym (tokList);

  lineNSave = entry->file[0]->n;


  if (doPrintAll || LibIsField (entry->field, "active")) {
    if (ParGetNum ("doInsertHyperLinks")) {
      for (context=0; (hLink = LibNextHyperLink (&context));) {
  if (entry->field == hLink->field) {
    gBuff[0] = '\0';  /* reset global buffer */
    printSave = ParGetFunction ("printf");
    ParDefFunction ("printf", (INT4(*)()) WwwPrintBuff);
    
    if (EntryFieldPrint (entry, 1)) {
      entryCurr = entry; /* set global entry */
      PrsString (gBuff, hLink->parser, tokList, hLink->parse, NULL);
      ParDefFunction ("printf", printSave);

/* --- add for output formatting - dgg --- */    
      if (gDoflyb) {
        char* newbuf= fbcode2report( gBuff, kPlainText, 1);  
        if (newbuf) { 
          printf ("%s", newbuf);
          free(newbuf);
          }
        }
      else    
 /* ^^^ add for output formatting - dgg ^^^ */    
        WwwPrintF (gBuff);
       
    }
    else
      FilURead (entry->file[0]);
    return;
  }
      }
    }

/* --- add for output formatting - dgg --- */    
   if (gDoflyb) {
      gBuff[0] = '\0';  
      printSave = ParGetFunction ("printf");
      ParDefFunction ("printf", (INT4(*)()) FlybPrintBuff);
      EntryFieldPrint (entry,1);
      printf ("%s", gBuff); /*  WwwPrintF (gBuff);*/
      ParDefFunction ("printf", printSave);
      }
    else
 /* ^^^ add for output formatting - dgg ^^^ */    
     EntryFieldPrint (entry,1);
 }
  if (lineNSave == entry->file[0]->n)
    FilURead (entry->file[0]);

  return;
}

--
-- d.gilbert--biocomputing--indiana u--bloomington--gilbertd@bio.indiana.edu

From owner-srs@net.bio.net Wed Aug 16 23:00:00 1995
Newsgroups: bionet.software.gcg,bionet.software.srs
Path: biosci!daresbury!bioftp.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Subject: Re: GCG 8.1 clash with SRS
Message-ID: <1995Aug17.192436.22344@comp.bioz.unibas.ch>
Followup-To: bionet.software.gcg,bionet.software.srs
Organization: EMBnet Switzerland [Basel]
X-Newsreader: TIN [version 1.2 PL2]
References: <PMR.95Aug15162711@unst.sanger.ac.uk> <40sn51$dg4@lion.embl-heidelberg.de>
Date: Thu, 17 Aug 1995 19:24:36 GMT
Lines: 25
Xref: biosci bionet.software.gcg:1351 bionet.software.srs:138

Thure Etzold (etzold) wrote:

: i think it should be safe to add more sequence databanks to the gcg 
: installation...note that only databanks with gcg sequence format are
: accepted! ...if you have another SRS installation on your site than the

I started playing ... The alphabet is the limiting factor. 
ASSSCCCIII characters beyond z are added as menu selection options which 
make no real impression to biologists such as 

    |) XEMBL 
    }) XXEMBL
    ~) All libraries

:-) 


regards
Reinhard Doelz

-- 
 R.Doelz         Klingelbergstr.70| Tel. x41 61 267 2247  Fax x41 61 267 2078|
 Biocomputing        CH 4056 Basel| electronic Mail    doelz@ubaclu.unibas.ch|
 Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info@ch.embnet.org</a> 

From owner-srs@net.bio.net Thu Aug 17 23:00:00 1995
Newsgroups: bionet.software.gcg,bionet.software.srs
Path: biosci!agate!news.ucdavis.edu!library.ucla.edu!info.ucla.edu!news.bc.net!news.uoregon.edu!vixen.cso.uiuc.edu!howland.reston.ans.net!tank.news.pipex.net!pipex!sunsite.doc.ic.ac.uk!daresbury!bioftp.unibas.ch!doelz
From: doelz@comp.bioz.unibas.ch (Reinhard Doelz)
Subject: Solution; patch for Re: GCG 8.1 clash with SRS
Message-ID: <1995Aug18.122544.4066@comp.bioz.unibas.ch>
Organization: EMBnet Switzerland [Basel]
X-Newsreader: TIN [version 1.2 PL2]
References: <PMR.95Aug15162711@unst.sanger.ac.uk> <40sn51$dg4@lion.embl-heidelberg.de> <1995Aug17.192436.22344@comp.bioz.unibas.ch>
Date: Fri, 18 Aug 1995 12:25:44 GMT
Lines: 160
Xref: biosci bionet.software.gcg:1352 bionet.software.srs:139

: Thure Etzold (etzold) wrote:

: : i think it should be safe to add more sequence databanks to the gcg 
: : installation...note that only databanks with gcg sequence format are
: : accepted! ...if you have another SRS installation on your site than the

The following will make 'lookup' run - eventually it will be 'elookup' in a 
future EGCG release :-) with the standard SRS installation.

It applies to all colleagues who would like to run SRS (generic, publically 
available)  and GCG (licensed, commercial) with the same indices. To run this
patch, you need (a) a valid GCG license, and 'lookup' version 8.1, and 
SRS version 4_06 or later. This patch will allow you to run 'lookup' as 
INTENDED by GCG in sequence-only mode, with all your own sequence libs, and 
does neither attempt nor desire to run anything else (such as prosite). 

To apply the patch, become a user DIFFERENT than the one owning the gcg tree
(for security reasons). Create a new directory, copy the lookup.c from GCG 
to lookup.c.gcg and run it like the following:

You will need to follow the GCG instructions and fetch a makefile according
to the manual. Add the $SRSSOU and $SRSEXE directories to the CFLAGS for 
header inclusion, and $SRSEXE/libsrs.a for linking. My make ran as follows:

cc -DNDEBUG  -Dosf -I/bioz4/srs/SRS/srs4_06/src -I/bioz4/srs/SRS/srs4_06/bin/osf
 -I/bioa1/gcg/gcgsource/include -c -g lookup.c
cc  -DNDEBUG lookup.o /bioa1/gcg/gcgbin/oblib/libapp.a  -L/bioa1/gcg/gcgbin/obli
b -lgenshare -lcurses -lm /bioz4/srs/SRS/srs4_06/bin/osf/libsrs.a -o lookup

To apply the patch, extract the file attached (remove the lines
of this message INCLUDING the 'cut here' lines) and run 'patch' as follows:


 bioz.embnet.unibas.ch > patch -l < patch.lookup


Hmmm... looks like a new-style context diff to me...
The text leading up to this was:
--------------------------
|*** lookup.c.gcg       Sat Jul 15 12:35:53 1995
|--- lookup.c   Fri Aug 18 14:13:50 1995
--------------------------
Patching file lookup.c.gcg using Plan A...
Hunk #1 succeeded at 35.
Hunk #2 succeeded at 148.
Hunk #3 succeeded at 567.
Hunk #4 succeeded at 582.
Hunk #5 succeeded at 1496.
done


Rename the lookup.c.gcg to lookup. 
Compile and link, and enjoy. SRS indexing and updating via SRS 4_06 etc.
as usual. To run it you will first need to source 'gcg' and afterwards
'prep_srs' in order to get SRSDAT assigned correctly. 

Regards
Reinhard Doelz
BioComputing Basel

--------------------------------> cut here <--------------------------


*** lookup.c.gcg        Sat Jul 15 12:35:53 1995
--- lookup.c    Fri Aug 18 14:13:50 1995
***************
*** 35,40 ****
--- 35,41 ----
  
  #define SRS_MAIN
  #include "srs.h"
+ int  (*PrintMessage)(MSGo *) = NULL;
  
  /* GCG includes and defines. */
  
***************
*** 147,153 ****
  
  #define LIVE_PARAMETER(a) ((a)->pa_value)
  
! #define MAX_ACTIVE_LIBS  10
  #define BUFFSIZ          2000000
  #define FIELDLENGTH      256
  #define RANGEFIELDLENGTH 256
--- 148,154 ----
  
  #define LIVE_PARAMETER(a) ((a)->pa_value)
  
! #define MAX_ACTIVE_LIBS  24  /* Doelz /BCB : allow 24 libraries max */ 
  #define BUFFSIZ          2000000
  #define FIELDLENGTH      256
  #define RANGEFIELDLENGTH 256
***************
*** 566,575 ****
      int lcontext;
      SLBo *l;
  
      for(lcontext = 0; l = LibNextLib(g, &lcontext); ) {
        char name[133];
        char buf[MAXPATHLEN];
! 
        LibGetIndexName(l, LibGetIdField(l), name, "r");
  
        /* check to see if we have the corresponding indices */ 
--- 567,578 ----
      int lcontext;
      SLBo *l;
  
+     if (strcmp(g->com,"Sequence") == 0) {/* Doelz/BCB: Allow only Sequences */  
+  
      for(lcontext = 0; l = LibNextLib(g, &lcontext); ) {
        char name[133];
        char buf[MAXPATHLEN];
!       if (count < 24) {   /* Doelz/BCB: Stop reading after max 23 libs */ 
        LibGetIndexName(l, LibGetIdField(l), name, "r");
  
        /* check to see if we have the corresponding indices */ 
***************
*** 579,585 ****
        if(!access(buf, R_OK))
        activeLib[count++] = l;
      }
! 
    } /* end loop */
  
    activeLib[count] = NULL;
--- 582,589 ----
        if(!access(buf, R_OK))
        activeLib[count++] = l;
        } 
!     }
!    } 
    } /* end loop */
  
    activeLib[count] = NULL;
***************
*** 1492,1498 ****
    /* write the set members */
  
    pCount = 0;
!   tmpMsgF = MsgGetFnct();
    MsgSetFnct(printListMessage); /* capture error messages */
    tmpPrintF = ParGetFunction("printf");
    ParDefFunction ("printf", (INT4 (*)()) printLine);
--- 1496,1502 ----
    /* write the set members */
  
    pCount = 0;
!   tmpMsgF = PrintMessage;  
    MsgSetFnct(printListMessage); /* capture error messages */
    tmpPrintF = ParGetFunction("printf");
    ParDefFunction ("printf", (INT4 (*)()) printLine);

--------------------------------> cut here <--------------------------

-- 
 R.Doelz         Klingelbergstr.70| Tel. x41 61 267 2247  Fax x41 61 267 2078|
 Biocomputing        CH 4056 Basel| electronic Mail    doelz@ubaclu.unibas.ch|
 Biozentrum der Universitaet Basel|-------------- Switzerland ---------------|
<a href=http://beta.embnet.unibas.ch/>EMBnet Switzerland:info@ch.embnet.org</a> 

