From owner-srs@net.bio.net Fri Nov 01 22:00:00 1996
Path: biosci!daresbury!s-ind2!mph
From: mph@s-ind2.dl.ac.uk (M.P. Hilbers)
Newsgroups: bionet.software.srs
Subject: Re: indexing Genbank 97 from gcg format
Date: 2 Nov 1996 20:51:58 GMT
Organization: Daresbury Lab, Warrington, U.K.
Lines: 39
Distribution: bionet
Message-ID: <55gc9e$3eo@mserv1.dl.ac.uk>
References: <32789728.41C6@dl.ac.uk>
NNTP-Posting-Host: s-ind2.dl.ac.uk
X-Newsreader: TIN [version 1.2 PL2]

Martin Hilbers (mph@dl.ac.uk) wrote:
: Hi folks,

: Has anybody been able to index genbank 97 from gcg formatted files ? -
: For me indexing keeps falling over on entry HSHLAK1 in the primates
: section. 
: The error message I get is:
: e__nammismatch, names are not identical
: "HSHLAK1" and "C"

: It does not occur with the plain genbank files. It can't have anything
: to do with the size of genbank, as it also occurs if I restrict 
: genbank to the primates section. As far as I can tell the gcg
: seq and ref file are completely normal for this entry.
: srsbuild is apparently supposed to produce this error message if
: the names in the ref and seq files are not the same, but as far as I
: can tell they are the same, and I really don't understand were this "C"
: in the error message comes from.

Well,  I managed to figure things out. The problem was apparently
caused by keywords in genbank extending over more than one line. The patch
to genbank.sdl suggested by Janice Coventry earlier in this newsgroup
solved the problem. Funny though that a misread keyword somehow ends up
interfering with the entry name in a completely different file. But I 
suppose that if you carefully comb the code it all makes sense.

Cheers,

Martin


---------------------------------------------------------------------------
|  Martin Hilbers          |       E-mail: m.p.hilbers@dl.ac.uk           |
|  SEQNET                  |       Tel:    +44-1925-603492                |
|  Daresbury Laboratory    |       Fax:    +44-1925-603100                |
|  Daresbury, Warrington   |----------------------------------------------|
|  Cheshire WA4 4AD        |    SEQNET is the UK national EMBNet node     |
|  United Kingdom          |    http://www.dl.ac.uk/SEQNET/home.html      |
---------------------------------------------------------------------------

From owner-srs@net.bio.net Tue Nov 05 22:00:00 1996
Path: biosci!rutgers!mcrcr6!cmcl2!news.nyu.edu!mcrcr1.med.nyu.edu!user
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: searching in comment field
Message-ID: <browns02-0611961151380001@mcrcr1.med.nyu.edu>
From: browns02@mcrcr.med.nyu.edu (Stuart M. Brown)
Date: Wed, 06 Nov 1996 11:51:38 -0500
References: <browns02-3010961305560001@mcrcr1.med.nyu.edu>
 <558p26$pgc@dismay.ucs.indiana.edu>
Organization: NYU-MC Research Computing Resource
NNTP-Posting-Host: mcrcr1.med.nyu.edu
Lines: 50
Xref: biosci bionet.software.gcg:2085 bionet.software.srs:335

> In article <browns02-3010961305560001@mcrcr1.med.nyu.edu>,
> Stuart M. Brown <browns02@mcrcr.med.nyu.edu> wrote:
> >We are doing a bunch of BLAST searches with the EST database.
> >
> >I just got an idea - since these est's are routinely BLASTed
> >before they are submitted- and these results are generally noted
> >in the comment field, why can't I search the EST database for
> >comments that mention BLAST hits against my sequences of interest?
> >
> >I've tried LOOKUP- but the "All text" field does not appear to
> >include the comment field of the GenBank entry.  I also
> >tried ENTREZ with only a bit more success - yet I can see mentions
> >of my sequence in the comments when I do my own BLAST searches and 
> >then read the full annotation of the ESTs that we hit.

In article <558p26$pgc@dismay.ucs.indiana.edu>, gilbertd@bio.indiana.edu
(Don Gilbert) wrote:

> Stuart,
> 
> This may be a function of how SRS (or the GCG variant Lookup) is
> configured at a particular server.  At IUBIo Archive, I revised the 
> indexing for SRS to make sure the Genbank comment fields were
> searchable.  As a test of your question, I just now searched
> the Genbank EST section at IUBIo, searching "Comment" fields
> for "BLAST", and found 165 matches.
> 
> Feel free to use this SRS server for Genbank lookups. 
>   http://iubio.bio.indiana.edu:81/srs/srsc
> 
> - Don
> 
> -- d.gilbert--biocomputing--indiana u--bloomington--gilbertd@bio.indiana.edu

----------------------------------------------

Don, You found only 165 ESTs that mention "BLAST" in the comment field???

I believe that virtually all ESTs are BLAST searched before being
submitted to the database, and I have heard a statistic that about 40% of
them find one or more significant hits.  You should get hundred of
thousands of hits, no?
Where is this information stored if it is not in the comment or
description fields??

-- 
Stuart M. Brown, Molecular Biology Consultant 
NYU-MC Research Computing Resource, Dept. of Cell Biology
550 First Ave, New York, NY 10016
Phone: (212)263-7689  FAX: (212)263-8139

From owner-srs@net.bio.net Thu Nov 07 22:00:00 1996
Path: biosci!bcm.tmc.edu!news.msfc.nasa.gov!newsfeed.internetmci.com!btnet!netcom.net.uk!nntpfeed.doc.ic.ac.uk!sunsite.doc.ic.ac.uk!lyra.csx.cam.ac.uk!hgmp.mrc.ac.uk!sanger.ac.uk!pmr
From: pmr@sanger.ac.uk
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: searching in comment field
Date: 08 Nov 1996 10:47:10 GMT
Organization: The Sanger Centre
Lines: 265
Message-ID: <PMR.96Nov8104710@unst.sanger.ac.uk>
References: <browns02-3010961305560001@mcrcr1.med.nyu.edu>
	<558p26$pgc@dismay.ucs.indiana.edu>
	<browns02-0611961151380001@mcrcr1.med.nyu.edu>
NNTP-Posting-Host: unst.sanger.ac.uk
In-reply-to: browns02@mcrcr.med.nyu.edu's message of Wed, 06 Nov 1996 11:51:38 -0500
Xref: biosci bionet.software.gcg:2087 bionet.software.srs:338

In article <browns02-0611961151380001@mcrcr1.med.nyu.edu> browns02@mcrcr.med.nyu.edu (Stuart M. Brown) writes:
>   > In article <browns02-3010961305560001@mcrcr1.med.nyu.edu>,
>   > Stuart M. Brown <browns02@mcrcr.med.nyu.edu> wrote:
>   > >We are doing a bunch of BLAST searches with the EST database.
>   > >
>   > >I just got an idea - since these est's are routinely BLASTed
>   > >before they are submitted- and these results are generally noted
>   > >in the comment field, why can't I search the EST database for
>   > >comments that mention BLAST hits against my sequences of interest?
>   > >
>   > >I've tried LOOKUP- but the "All text" field does not appear to
>   > >include the comment field of the GenBank entry.  I also
>   > >tried ENTREZ with only a bit more success - yet I can see mentions
>   > >of my sequence in the comments when I do my own BLAST searches and 
>   > >then read the full annotation of the ESTs that we hit.
>
>   In article <558p26$pgc@dismay.ucs.indiana.edu>, gilbertd@bio.indiana.edu
>   (Don Gilbert) wrote:
>
>   > Stuart,
>   > 
>   > This may be a function of how SRS (or the GCG variant Lookup) is
>   > configured at a particular server.  At IUBIo Archive, I revised the 
>   > indexing for SRS to make sure the Genbank comment fields were
>   > searchable.  As a test of your question, I just now searched
>   > the Genbank EST section at IUBIo, searching "Comment" fields
>   > for "BLAST", and found 165 matches.
>   > 
>   Don, You found only 165 ESTs that mention "BLAST" in the comment field???
>
>   I believe that virtually all ESTs are BLAST searched before being
>   submitted to the database, and I have heard a statistic that about 40% of
>   them find one or more significant hits.  You should get hundred of
>   thousands of hits, no?
>   Where is this information stored if it is not in the comment or
>   description fields??

The information about BLAST hits is generated by NCBI and included in
the dbEST database (in the dbEST.reports file). The BLAST hit
information is updated from time to time, but as far as I am aware it
is not included in the GenBank entries unless it is used to clearly
identify the EST.

The dbEST.reports is available through SRS WWW servers at a number of
sites.

Sadly, our copy has a problem at the moment - it went over 2Gb file size
and the SGI system running our SRS WWW server does not like it. I plan to
split the file (SRS will still look the same) this weekend to work
around it.

Meanwhile, I am working on the new SRS 5.0 parsing for dbEST, and will
certainly be trying to index the blast hits in some way. This gets a
little tricky - for example, it is not trivial to combine the scores
and the text for a given hits though it should be possible.

So, an obvious question: what information would you like to search for
in the BLAST hit fields?



An example dbEST.reports entry is included below. Some dbEST entries
have only nucleotide hits, some have only protein hits, and some have
no hits at all. They can have anything up to 15 hits of either type.

dbEST Id:	1
EST name:       EST00001
GenBank Acc:    M61954
GenBank gi:	272204
GDB Dsegment:	D0S2263E

CLONE INFO
Clone Id:	HHCI89
Source:         ATCC
Id as DNA:	65129
Id in host:	65128
DNA type:	cDNA

PRIMERS
Sequencing:     M13 Forward

SEQUENCE
		GCCATCCTGCGTCTGGACCTGGCTGGCCGGGACCTGACTGACTACCTCATGAAGATCCTC
		ACCGAGCGCGGCTACAGCTTCACCACCACGGCCGAGCGGGAAATCGTGCGTGACATTAAG
		GAGAAGCTGTGCTACGTCGCCCTGGACTTCGAGCAAGAGATGGCCACGGCTGCTTCCAGC
		TCCTCCCTGGAGAAGAGCTACGAGCTGCCTGACGGCCAGGTCATCACCATTGGCAATGAG
		CGGTTCCGCTGCCCTGAGGCACTCTTCCAGCCTTCCTTCCTGGGCATGGAGTCCTGTGGC
		ATCCACGGAACTACCTTCAACTCCATCATGAAGTGTGACGTGGACATTCGGAAAGACCTG
		TACGGCAACACAGTGCT

Entry Created:	May 26 1992 
Last Updated:	May 26 1992 

PUTATIVE ID	Assigned by submitter
                Actin, gamma, cytoskeletal

LIBRARY
Lib Name:       Hippocampus, Stratagene (cat. #936205)
Organism:       Homo sapiens
Vector:         lambdaZAP-II
Description:    Female, 2 years; oligo-dT + random primed cDNA synthesis;
                lambdaZAP-II vector, 1.0kb average insert size.

SUBMITTER
Name:           Kerlavage AR
Lab:            Bioinformatics
Institution:    The Institute for Genomic Research
Address:        9712 Medican Center Drive, Rockville, MD 20850 USA
Tel:		3018699056
Fax:		3018699423
E-mail:		arkerlav@tigr.org

CITATIONS
Medline UID:	91262645
Title:          Complementary DNA sequencing: expressed sequence tags and human
                genome project
Authors:        Adams,M.D., Kelley,J.M., Gocayne,J.D., Dubnick,M.,
                Polymeropoulos,M.H., Xiao,H., Merril,C.R., Wu,A., Olde,B.,
                Moreno,R.F., etal
Citation:	Science 252: 1651-6 1991

MAP DATA

NEIGHBORS

Top 15 protein matches

Neighbor:       gi|1703 (X60733) gamma non-muscle actin [Oryctolagus cuniculus]
                gi|231506|sp|P29751|ACTB_RABIT ACTIN, CYTOPLASMIC 1
                (BETA-ACTIN). gi|279668|pir||ATRBB actin beta - rabbit
Pvalue:		9.309e-83

Neighbor:       gi|576368|pdb|2BTF|A Beta-Actin-Profilin Complex
Pvalue:		9.309e-83

Neighbor:       gi|537596 (M24769) actin [Xenopus laevis] gi|280660|pir||A43552
                actin - African clawed frog
Pvalue:		9.309e-83

Neighbor:       gi|309090 (J04181) A-X actin [Mus musculus]
                gi|90260|pir||A31900 actin A(X) - mouse
Pvalue:		9.309e-83

Neighbor:       gi|28252 (X00351) beta-actin [Homo sapiens] gi|177968 (M10277)
                cytoplasmic beta actin [Homo sapiens] gi|49866 (X03672)
                beta-actin (aa 1-375) [Mus musculus] gi|55575 (V01217)
                beta-actin [Rattus norvegicus] gi|211237 (L08165) beta-actin
                [Gallus gallus] gi|113270|sp|P02570|ACTB_HUMAN ACTIN,
                CYTOPLASMIC 1 (BETA-ACTIN). gi|71618|pir||ATHUB actin beta -
                human gi|71619|pir||ATMSB actin beta - mouse
                gi|279669|pir||ATCHB actin beta - chicken
Pvalue:		9.309e-83

Neighbor:       gi|28339 (X04098) gamma-actin [Homo sapiens] gi|178043 (M19283)
                gamma-actin [Homo sapiens] gi|57574 (X52815) cytoskeletal
                gamma-actin (AA 1-375) [Rattus rattus] gi|309089 (M21495)
                gamma-actin [Mus musculus] gi|113278|sp|P02571|ACTG_HUMAN ACTIN,
                CYTOPLASMIC 2 (GAMMA-ACTIN). gi|71623|pir||ATHUG actin gamma -
                human gi|71624|pir||ATMSG actin gamma - mouse
                gi|111332|pir||S11222 actin gamma, cytoskeletal - rat
Pvalue:		9.309e-83

Neighbor:       gi|202654 (J00691) cytoplasmic beta actin [Rattus norvegicus]
                gi|71620|pir||ATRTC actin beta - rat
Pvalue:		9.309e-83

Neighbor:       gi|1334642|gnl|PID|e184505 (X07507) actin [Xenopus borealis]
                gi|113271|sp|P15475|ACTB_XENBO ACTIN, CYTOPLASMIC TYPE 1 (BETA
                ACTIN). gi|85691|pir||S01077 actin beta, cytoskeletal - Kenyan
                clawed frog
Pvalue:		9.309e-83

Neighbor:       gi|213273 (M26111) beta-actin [Anser anser]
                gi|113267|sp|P14104|ACTB_ANSAN ACTIN, CYTOPLASMIC BETA.
                gi|627304|pir||A55001 actin beta - goose
Pvalue:		9.309e-83

Neighbor:       gi|63018 (X00182) beta-actin [Gallus gallus]
Pvalue:		9.309e-83

Neighbor:       gi|761724 (U20114) beta-actin [Cricetulus griseus]
                gi|1351867|sp|P48975|ACTB_CRIGR ACTIN, CYTOPLASMIC 1
                (BETA-ACTIN).
Pvalue:		9.309e-83

Neighbor:       gi|71621|pir||ATBOB actin beta - bovine (tentative sequence)
Pvalue:		9.309e-83

Neighbor:       gi|71625|pir||ATBOG actin gamma - bovine (tentative sequence)
Pvalue:		9.309e-83

Neighbor:       gi|809561 (X13055) gamma-actin [Mus musculus]
Pvalue:		9.786e-83

Neighbor:       gi|49868 (X03765) put. beta-actin (aa 27-375) [Mus musculus]
                gi|387083 (M12481) cytoplasmic beta-actin [Mus musculus]
Pvalue:		1.029e-82


Top 15 nucleotide matches

Neighbor:       gi|28251|emb|X00351|HSAC07 Human mRNA for beta-actin
Pvalue:		3.325e-149

Neighbor:       gi|28335|emb|X63432|HSACTB H.sapiens ACTB mRNA for mutant
                beta-actin (beta'-actin)
Pvalue:		3.325e-149

Neighbor:       gi|476331|gb|U07786|SSU07786 Sus scrofa beta actin mRNA,
                partial cds.
Pvalue:		2.014e-129

Neighbor:       gi|178044|gb|M16247|HUMACTGAA Human gamma-actin mRNA, partial
                cds.
Pvalue:		3.857e-129

Neighbor:       gi|28338|emb|X04098|HSACTCGR Human mRNA for cytoskeletal
                gamma-actin
Pvalue:		6.359e-129

Neighbor:       gi|1702|emb|X60733|OCRNAGNMA O.cuniculus mRNA for gamma-non
                muscle actin
Pvalue:		2.003e-127

Neighbor:       gi|191660|gb|J04181|MUSACTMEL Mouse A-X actin mRNA, complete
                cds.
Pvalue:		1.144e-123

Neighbor:       gi|49865|emb|X03672|MMACTBR Mouse cytoskeletal mRNA for
                beta-actin
Pvalue:		1.202e-123

Neighbor:       gi|191581|gb|M12481|MUSACCYB Mouse cytoplasmic beta-actin mRNA.
Pvalue:		1.030e-121

Neighbor:       gi|49867|emb|X03765|MMACTBR2 Mouse mRNA for cytoplasmatic
                beta-actin (pAL 41; AA 27-375)
Pvalue:		1.698e-121

Neighbor:       gi|213272|gb|M26111|GOOACTB Goose beta-actin mRNA, complete
                cds.
Pvalue:		2.180e-121

Neighbor:       gi|567191|gb|L36342|MOZBEAC Morone saxatilis (striped bass)
                beta-actin mRNA, partial cds.
Pvalue:		2.655e-120

Neighbor:       gi|211236|gb|L08165|CHKBACTN Gallus gallus beta-actin mRNA,
                complete cds.
Pvalue:		3.392e-118

Neighbor:       gi|57573|emb|X52815|RRGAMACT Rat mRNA for cytoplasmic-gamma
                isoform of actin
Pvalue:		1.184e-117

Neighbor:       gi|51042|emb|X13055|MMGACTR Murine mRNA for cytoplasmic
                gamma-actin
Pvalue:		1.952e-117
--
------------------------------------------------------------------------
Peter Rice                           | Informatics Division,
E-mail: pmr@sanger.ac.uk             | The Sanger Centre,
Tel: (44) 1223 494967                | Wellcome Trust Genome Campus,
Fax: (44) 1223 494919                | Hinxton, Cambridge, CB10 1SA,
URL: http://www.sanger.ac.uk/~pmr/   | England

From owner-srs@net.bio.net Sun Nov 10 22:00:00 1996
Path: biosci!rutgers!mcrcr6!cmcl2!news.nyu.edu!mcrcr1.med.nyu.edu!user
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: searching in comment field
Message-ID: <browns02-1111961631380001@mcrcr1.med.nyu.edu>
From: browns02@mcrcr.med.nyu.edu (Stuart M. Brown)
Date: Mon, 11 Nov 1996 16:31:38 -0500
References: <browns02-3010961305560001@mcrcr1.med.nyu.edu>
 <558p26$pgc@dismay.ucs.indiana.edu>
 <browns02-0611961151380001@mcrcr1.med.nyu.edu>
 <PMR.96Nov8104710@unst.sanger.ac.uk>
Organization: NYU-MC Research Computing Resource
NNTP-Posting-Host: mcrcr1.med.nyu.edu
Lines: 44
Xref: biosci bionet.software.gcg:2097 bionet.software.srs:341

> >   I believe that virtually all ESTs are BLAST searched before being
> >   submitted to the database, and I have heard a statistic that about 40% of
> >   them find one or more significant hits.  You should get hundred of
> >   thousands of hits, no?
> >   Where is this information stored if it is not in the comment or
> >   description fields??
> 
> The information about BLAST hits is generated by NCBI and included in
> the dbEST database (in the dbEST.reports file). The BLAST hit
> information is updated from time to time, but as far as I am aware it
> is not included in the GenBank entries unless it is used to clearly
> identify the EST.
> 
> The dbEST.reports is available through SRS WWW servers at a number of
> sites.
> 
> Meanwhile, I am working on the new SRS 5.0 parsing for dbEST, and will
> certainly be trying to index the blast hits in some way. This gets a
> little tricky - for example, it is not trivial to combine the scores
> and the text for a given hits though it should be possible.
> 
> So, an obvious question: what information would you like to search for
> in the BLAST hit fields?

This is really agonizing.  Here is all of this beautiful data, but apparently no
good way to use it.  I don't think that the SRS indicies should be expanded
to include 15 protein and 15 nucleotide hits (and their names and the 
significance level of each hit).  It already takes us anywhere from 6 to 36
hours to recreate the SRS indicies on our GCG system after each full GenBank
updaate (and this is on a fast Alpha machine!).  Perhaps the time is ripe for
a new tool - sort of a reverse BLASTer that takes a given sequence and 
identifies all EST's that mention that sequence in their BLAST report.  

Think about it - here is all of this information about EST's, but unless you
already know the accession # of a particular EST, then you will never see it -
so everyone has to do the BLAST against the EST's for themselves without
knowing 
that the EST's have already been compared with their sequence.

-- 
Stuart M. Brown, Molecular Biology Consultant 
NYU-MC Research Computing Resource, Dept. of Cell Biology
550 First Ave, New York, NY 10016
Phone: (212)263-7689  FAX: (212)263-8139

From owner-srs@net.bio.net Mon Nov 11 22:00:00 1996
Path: biosci!bcm.tmc.edu!cs.utexas.edu!www.nntp.primenet.com!nntp.primenet.com!nntp.uio.no!nntp.zit.th-darmstadt.de!fu-berlin.de!news-ber1.dfn.de!news-ham1.dfn.de!news-han1.dfn.de!news.dfn.de!news.embl-heidelberg.de!usenet
From: Thure Etzold <etzold@embl-heidelberg.de>
Newsgroups: bionet.software.gcg,bionet.software.srs
Subject: Re: searching in comment field
Date: Tue, 12 Nov 1996 13:40:32 +0100
Organization: EMBL
Lines: 47
Distribution: world
Message-ID: <3288703F.41C6@embl-heidelberg.de>
References: <browns02-3010961305560001@mcrcr1.med.nyu.edu>
	 <558p26$pgc@dismay.ucs.indiana.edu>
	 <browns02-0611961151380001@mcrcr1.med.nyu.edu>
	 <PMR.96Nov8104710@unst.sanger.ac.uk> <browns02-1111961631380001@mcrcr1.med.nyu.edu>
NNTP-Posting-Host: kappa.embl-heidelberg.de
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 3.0 (X11; I; OSF1 V2.0 alpha)
To: "Stuart M. Brown" <browns02@mcrcr.med.nyu.edu>
Xref: biosci bionet.software.gcg:2098 bionet.software.srs:342

Stuart M. Brown wrote:

> > Meanwhile, I am working on the new SRS 5.0 parsing for dbEST, and will
> > certainly be trying to index the blast hits in some way. This gets a
> > little tricky - for example, it is not trivial to combine the scores
> > and the text for a given hits though it should be possible.
> >
> > So, an obvious question: what information would you like to search for
> > in the BLAST hit fields?
> 
> This is really agonizing.  Here is all of this beautiful data, but apparently no
> good way to use it.  I don't think that the SRS indicies should be expanded
> to include 15 protein and 15 nucleotide hits (and their names and the
> significance level of each hit).  It already takes us anywhere from 6 to 36
> hours to recreate the SRS indicies on our GCG system after each full GenBank
> updaate (and this is on a fast Alpha machine!).  Perhaps the time is ripe for
> a new tool - sort of a reverse BLASTer that takes a given sequence and
> identifies all EST's that mention that sequence in their BLAST report.
> 

I wouldn't be so pessimistic about using the data in dbEST. The protein
and nucleotide
hits could be even indexed as subentries so it could be possible to
search them 
independently of the entries in which they occur. So questions would be
possible like
find me all ESTs that are similar above a certain threshold with a
certain DNA sequence.

The indexing in SRS4 is a problem since the memory usage can be enormous
depending on the
size of the databank. The reason that from your experience indexing can
take from 6 to 36
hours is probably swapping: once SRS runs out of physical memory the
computer must start
swapping and that slows down indexing to almost a halt.

In srs5 that problem is solved by indexing large databanks in chunks and
merging them 
later. A good candidate - and one of the reasons I did it - is dbEST. 

regards
Thure

...latest predictions for the final release is end of november - am
currently working on
displaying subentries

From owner-srs@net.bio.net Mon Nov 11 22:00:00 1996
Path: biosci!daresbury!bioftp.unibas.ch!infobiogen.fr!newsmaster
From: Jean-Marc Plaza <plaza@infobiogen.fr>
Newsgroups: bionet.software.srs
Subject: Hyperlink problem
Date: Tue, 12 Nov 1996 16:30:38 +0100
Organization: INFOBIOGEN
Lines: 93
Message-ID: <3288981E.5E89@infobiogen.fr>
NNTP-Posting-Host: lovelace.infobiogen.fr
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 2.02 (X11; I; SunOS 5.5.1 sun4d)

Hello SRS managers,
I have a problem with making an hyperlink field in virgil.
When i build a list with displaying the GDBuid field,
the hyperlink is OK :
********************************************
     VIRGIL:14582 
     IDa    gdb:118728; ANPEP; 31-MAY-1993
                ^^^^^^
                hyperlink is OK
********************************************

Then when i display the all entry the hyperlink is lost.

********************************************
TYP    vgl:14582; GDB[gene] * Genbank[sequence]
OR1    genXref; v1.0; 0.61; 01-AUG-1996
OR2    GDB; v6.1; undef; undef
xxx
IDa    gdb:118728; ANPEP; 31-MAY-1993           <---- no more hyperlink
DFa-   alanyl (membrane) aminopeptidase (aminopeptidase N,
DFa    aminopeptidase M, microsomal aminopeptidase, CD13, p150)
xxx
IDb    gbk:M22324; HUMAMIPEP; 31-OCT-1994
DFb-   Human aminopeptidase N/CD13 mRNA encoding aminopeptidase
DFb    N, complete cds. 
********************************************

This case coud be occured in other databank.
Could someone have a clue in my problem ???

Thanks all in advance

--------------------------------------

here is an extract of virgil.sdl
noticed that IDa field is indexed in DF_ID1 but also in DF_DBI and
DF_LINK.

 #libformat 
......
    ! database id
    !------------
    #field
      /itype=key  /ftype=@DF_DBI  /idtype=@SRSxSEQID
      /begstr="TYP", "IDa", "IDb"       /find=db_id
      /maxlines=1

    ! object 1
    !---------
    #field
      /gblid=%VIRGIL_GDB
      /itype=key  /ftype=@DF_ID1 /idtype=@SRSxSEQID
      /begstr="IDa"      /find=id      /maxlines=1

    ! link to MIMMAP
    !-----------------
    #field
      /itype=link  /ftype=@DF_LINK  /idtype=@SRSxLINKID
      /begstr="IDa"  /maxlines=1 /find=link2mimmap

#parser
......
  id = [ vgl_id | gbk_id | gdb_id ];
  gbk_id =  'gbk' ':'  ~A-Za-z0-9~ <wrt>;
  vgl_id =  'vgl' ':'  ~0-9~ <wrt>;
  gdb_id =  'gdb' ':'   ~0-9~ <wrt>;

  db_id = [ vgl_lnk | gbk_lnk | gdb_lnk ];
  gbk_lnk =  'gbk:'<wrt>   accno <app>;
  gdb_lnk =  'gdb:'<wrt>   uid <app>;
  vgl_lnk =  'vgl:'<wrt>   uid <app>;  link2mimmap = [ gdb_map | gbk_map
];  gdb_map     = 'gdb' ':' ~A-Z0-9a-z~ ';'  ~A-Z0-9a-z_\-~ <wrt
c=@MIMMAP_REF>;
  gbk_map     = 'gbk' ':' ~A-Z0-9a-z~ ';'  ~A-Z0-9a-z_\-~ <wrt
c=@MIMMAP_REF>;


extract of hyperlink.sdl:
....
#hyperlink /field=@VIRGIL_GDB
    /parse=gdb_id /parser=@INSERTLINK_PARSER
.....
  gdb_id = ~A-Za-z0-9~ 'gdb' ':'  ~0-9~ <wrt c=@FETCH_GDB
f=@F_INSERTHLINK>;


---------------------------------------------------
Jean-Marc PLAZA
INFOBIOGEN - CNRS
7, rue Guy Moquet BP8 94801 VILLEJUIF Cedex, France
tel: +33 45 59 52 39  fax: +33 45 59 52 50
e-mail: plaza@infobiogen.fr
---------------------------------------------------

From owner-srs@net.bio.net Tue Nov 12 22:00:00 1996
Path: biosci!agate!howland.erols.net!EU.net!Ireland.EU.net!web3.tcd.ie!gen035.gen.tcd.ie!user
From: atlloyd@acer.gen.tcd.ie (Andrew T. Lloyd)
Newsgroups: bionet.software.srs
Subject: SRS 4.08 indexing
Date: Wed, 13 Nov 1996 09:55:46 +0000
Organization: INCBI, Trinity College, Dublin 2, Ireland.
Lines: 28
Message-ID: <atlloyd-1311960955460001@gen035.gen.tcd.ie>
NNTP-Posting-Host: gen035.gen.tcd.ie

I have a minimal SRS setup here: SW PIR PRosite EMBL-without-ESTs
but have problems getting the links sorted:
the links from SW - EMBL are looking for the NI field and report

INFO:
no entries found

I see that the EBI and Sanger have changed things so that the
link is seeking the AC field and successfully finds it.  
Another large site in Hinxton appears to have the same problem
as I do.

Can some kind, patient soul tell me what I have to tweak ?

I also note that I have acquired (through no effort of mine!)
a link from SW to MIM at wwwgdb.gdb.org but that this is asking
for http://wwwgdb.gdb.org/omim/omimx which is "not found on
this server"  this is something in hyperlink.sdl which I should
be able to figure out for myself but I cannot solve my first
problem with grep !

Thanks,
Andrew

-- 
Andrew T. Lloyd  Irish National Centre for BioInformatics  INCBI
atlloyd@acer.gen.tcd.ie                   http://acer.gen.tcd.ie
Tel: (+353)-1-608-1969    EMBnet Ireland    Fax: (+353)-679-8558

From owner-srs@net.bio.net Tue Nov 12 22:00:00 1996
Path: biosci!bcm.tmc.edu!news.msfc.nasa.gov!newsfeed.internetmci.com!www.nntp.primenet.com!nntp.primenet.com!nntp.uio.no!nntp.zit.th-darmstadt.de!fu-berlin.de!news-ber1.dfn.de!news-ham1.dfn.de!news-han1.dfn.de!news.dfn.de!news.embl-heidelberg.de!usenet
From: Ramu Chenna <chenna>
Newsgroups: bionet.software.srs
Subject: Re: Hyperlink problem
Date: 13 Nov 1996 13:25:06 GMT
Organization: EMBL Heidelberg
Lines: 20
Distribution: world
Message-ID: <56ci7i$jej@lion.embl-heidelberg.de>
References: <3288981E.5E89@infobiogen.fr>
NNTP-Posting-Host: shag.embl-heidelberg.de
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: Mozilla 1.12IS (X11; I; IRIX 5.3 IP22)
X-URL: news:3288981E.5E89@infobiogen.fr

Hello 
With old srs the "complete entries" in the result page,
looks like designed to save the entries to your disk
(without hyerlink ofcouse) for further processing,
so you loose the hypertext link when you click this.

Workaround is just "select all the fields" to display from the queryform
page and your hyper link is there...
hope this helps. 

Ramu

===============================================================
Ramu C                                 | EMBL
E-mail: chenna@embl-heidelberg.de      | Postfach 10.2209
Tel: (49) 6221 356229 (Res)            | 69012 Heidelberg
Fax: (49) 6221 387517                  | Germany
http://www.embl-heidelberg.de/~chenna/
-------------------------------------------------------------


From owner-srs@net.bio.net Mon Nov 18 22:00:00 1996
Path: biosci!internet!biosci!not-for-mail
From: biohelp (BIOSCI Administrator)
Newsgroups: bionet.software.srs
Subject: BIOSCI/bionet miniFAQ & Fundraiser
Date: 19 Nov 1996 02:00:42 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 239
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <199611191000.CAA16163@net.bio.net>
NNTP-Posting-Host: net.bio.net

(LAST REVISION: 30-JUL-95)

This BIOSCI "miniFAQ" is designed to answer the questions that come up
the *most frequently*.  The main BIOSCI FAQ (Frequently Asked
Questions) is accessible on the World Wide Web at URL
http://www.bio.net/.

If you can not find an answer to your question in this or other
documentation, the BIOSCI technical support staff answers e-mail
queries sent to

		       biosci-help@net.bio.net

We can only answer questions about the use of the newsgroups and
mailing lists.  We unfortunately do not have the staff to do Internet
information searches or answer scientific questions.  Please post
those to the appropriate BIOSCI/bionet newsgroups.


	Contents:
	--------
	0) BIOSCI NEEDS YOUR SUPPORT!!

	1) Using the WWW to access the BIOSCI/bionet newsgroups.

	2) What to do about "spams," i.e., junk mail, ads, etc.

	3) Examples of subscribing and unsubscribing to the mailing lists.

	4) The BIOSCI user address and research interest directory.


0) BIOSCI NEEDS YOUR SUPPORT!!
------------------------------
BIOSCI's government funding has been expended, and we are now
operating solely from advertising revenue that we have raised from our
Web site at http://www.bio.net/.  We need just a few minutes of your
time to help us serve you.

You can do two important things which will take very little time for
you individually and will immensely help us continue to help you.

First, please use our WWW system at http://www.bio.net/ to access the
archives.  You can post or reply to messages via your Web browser as
described in item #1 below.  Your usage helps attract sponsors. If you
contact any of our sponsors, please be sure to thank them for
supporting BIOSCI. It is critical for them to get this feedback if
they are to continue their sponsorship for the long term.

Second, if you work for a company or organization that provides
products or services of interest to the biology community, please pass
this message on to your marketing or marketing communications
department or other appropriate group.  Please ask them to help
support BIOSCI by sponsoring our Web site and explain the uses and
benefits of the system to the biology community. If they are
interested, they can then contact us for further information at our
tech support address, biosci-help@net.bio.net.


1) Using the WWW to access the BIOSCI/bionet newsgroups.
--------------------------------------------------------
As of 10 December 1995, all BIOSCI/bionet full newsgroups are
accessible through the World Wide Web (WWW) at URL http://www.bio.net.
One can read and reply publicly or privately to both recent postings
and archived messages through one's Web browser if it is configured
properly to send e-mail.  Each newsgroup is equipped with its own WAIS
index.  The main BIOSCI home page also has access to the BIO-JOURNALS
Table of Contents database WAIS index and the BIOSCI user address
database described in another item further below.


2) What to do about "spams," i.e., junk mail, ads, etc.
-------------------------------------------------------
BIOSCI is a set of parallel USENET newsgroups (the "bionet" groups),
mailing lists, and a hypermail archive at URL http://www.bio.net/.
The same postings are distributed on all media (except for a small
number of mailing-list-only groups at net.bio.net).  Unfortunately it
is becoming a despicable practice on the Internet (by a few people out
to make a fast buck) to do automated mass postings to thousands of
newsgroups and mailing lists.  These attempts to grab free advertising
are refered to as "spams" in the usual, somewhat boneheaded, net
terminology.  USENET is more susceptible to this practice, and many
spams originate on the USENET groups and then are passed on to the
mailing lists.  However, spammers also get lists of mailing addresses
and hit these too, so neither medium is immune.

What should you do personally if you get junk mail?
---------------------------------------------------
Just delete it and move on without reading it further.  Filing a
protest is becoming increasingly useless because spammers are often
disguising the addresses where the messages are sent from.  Unless you
really understand Internet mail systems, your attempt at protest by
sending replies to the message will often end up being sent to the
address of an innocent person that the spammer is victimizing.

What can BIOSCI/bionet do to protect its newsgroups?
----------------------------------------------------
The only solution currently available is to moderate the newsgroup.
If this newsgroup is already moderated, then you are in good shape.
Moderation protects the USENET distribution from about 95% of the
spams that are being sent to date and protects the mailing lists
completely.  Moderation means, however, that someone has to take the
time to review each message before it goes out.  We have set up
software here that simply allows the moderator to forward to an
address at net.bio.net messages that (s)he wishes to have distributed.
This takes no more time than that needed to read the message and pass
it on, say about 1 min. per message.

Most newsgroups currently have a discussion leader who is responsible
for their newsgroup.  The discussions leaders and their e-mail
addresses are listed in the BIOSCI Information Sheet which is
available on the Web at http://www.bio.net/.  If a newsgroup is being
hit with too many junk postings, please contact the discussion leader
for that group and see if there is interest in moderating the group.
Please do not assume that by simply posting a complaint to the
newsgroup itself, anyone on the BIOSCI staff will act on your
complaint.  With close to 100 newsgroups to run, the BIOSCI staff has
to rely on the discussion leaders of each newsgroup to report problems
directly to us at biosci-help@net.bio.net.

We will moderate any of our newsgroups if the discussion leader tells
us that the readership of the group wishes to do so and if a moderator
is willing to do the work.  For most BIOSCI/bionet groups, this
entails only a few minutes of work each day.

Moderating a newsgroup will resolve probably 95% of the junk postings
on the USENET distribution.  Unfortunately there are easy ways for
determined spammers to override the moderation mechanism on USENET,
but we can protect our e-mail subscribers from unwanted postings if
the newsgroup is moderated.  You can also access our newsgroups over
the WWW at URL http://www.bio.net.  While this Web interface will not
stop spammers from trying to post to the groups, this will give you
yet another way, besides using USENET news, to keep the junk out of
your personal mail files.  For those of you with local USENET news
systems, the Web interface will also give you faster access to new
newsgroups and recent postings.


3) Examples of subscribing and unsubscribing to the mailing lists.
------------------------------------------------------------------
PLEASE NOTE: The BIOSCI management does NOT act on
subscription/unsubscription requests that are posted improperly to the
newsgroups and mailing lists.  People who do this only bother everyone
on the lists to no avail.  Please be sure to follow the proper
procedures below.

Gory details are in the BIOSCI Information sheets on the Web at
http://www.bio.net.  Below we give an example utilizing the
METHODS-AND-REAGENTS list at both of our two BIOSCI sites:

Users in the Americas and Pacific Rim countries who use the BIOSCI
------------------------------------------------------------------
node at computer net.bio.net:
----------------------------

A) Determine the "listname" which is the <=8 character mail address
                                         ^^^^^^^^^^^^^
   for the group.  These can be found in the BIOSCI Info. Sheet.  For
   the METHODS-AND-REAGENTS group the mailing address is
   methods@net.bio.net.  The listname is the portion of the address to
   the left of the @ sign, i.e., "methods".  The listname is used with
   the "subscribe" and "unsubscribe" commands illustrated below.

B) Mail all commands in the body of a mail message addressed to
   biosci-server@net.bio.net.  Do NOT send commands to the newsgroup
   posting addresses!  Leave the Subject: line blank, any text on it
   will be ignored.

C) In the body of your message put one or more of the following
   commands with an "end" command on the last line, e.g.,

   subscribe methods
   unsubscribe methods
   end

   Do NOT put your e-mail address or other text on these lines.  The
   server only allows you to cancel your subscription if the address
   on your mail header matches the address on our mailing list.
   Please ask for help at biosci-help@net.bio.net if your address has
   changed, e.g., if you know you are on the list but the server tells
   you that you are not a member.


Users in Europe, Africa, and Central Asia who use the BIOSCI node at
--------------------------------------------------------------------
computer daresbury.ac.uk (also known as dl.ac.uk):
-------------------------------------------------

To subscribe and unsubscribe to/from the BIOSCI lists, you need to
specify the full USENET newsgroup name with "bionet-news." prepended.
The USENET newsgroup names are listed in the BIOSCI Information sheet
on the Web at http://www.bio.net/.  For the METHODS-AND-REAGENTS list
the USENET newsgroup name is bionet.molbio.methds-reagnts, thus the
appropriate commands are

    sub bionet-news.bionet.molbio.methds-reagnts

    unsub bionet-news.bionet.molbio.methds-reagnts

These commands are included in a message addressed to mxt@dl.ac.uk,
NOT to the newsgroup mailing addresses.  As usual, include the text in
the body of the message as text on the Subject: line is ignored.

To unsubscribe from all the lists at the UK node, use

    unsub bionet-news

Please note that if the address in the list is different than the one
in your mail message header, you will not be able to unsubscribe by
this method. If you have problems, please mail biosci@daresbury.ac.uk.


4) The BIOSCI user address and research interest directory.
-----------------------------------------------------------
Please take this opportunity to add your name, address, and research
interest information to the BIOSCI User Address Database if you have
not already done so.

You can fill out the address form directly through our Web page at URL
http://www.bio.net/adrform.html.

The address database is reindexed nightly for WWW access (the URL is
http://www.bio.net/).  If you are not directly on the Internet but can
reach it by e-mail, please use our waismail server to access the user
directory.  waismail use is described above.  You can also request a
user address form by e-mail from biosci-help@net.bio.net.

Please check your database entry from time-to-time to see if your
address information is still up-to-date.  Because of our limited
personnel resources, we ask that you resubmit a *complete* form to
revise your entry; we only replace complete entries and do not have
resources to edit old forms.

				Sincerely,

				Dave Kristofferson
				BIOSCI/bionet Manager

				biosci-help@net.bio.net

From owner-srs@net.bio.net Tue Nov 19 22:00:00 1996
Path: biosci!rutgers!uwm.edu!www.nntp.primenet.com!nntp.primenet.com!howland.erols.net!surfnet.nl!swidir.switch.ch!swsbe6.switch.ch!news.vub.ac.be!ben!gbottu
From: gbottu@ben.vub.ac.be (Guy Bottu)
Newsgroups: bionet.software.srs
Subject: Re: SRS 4.08 indexing
Date: 20 Nov 1996 19:11:58 GMT
Organization: Belgian EMBnet Node
Lines: 56
Message-ID: <56vl5u$7o4@rc1.vub.ac.be>
References: <atlloyd-1311960955460001@gen035.gen.tcd.ie>
NNTP-Posting-Host: ben.vub.ac.be
X-Newsreader: TIN [version 1.2 PL2]

Andrew T. Lloyd (atlloyd@acer.gen.tcd.ie) wrote:
: I have a minimal SRS setup here: SW PIR PRosite EMBL-without-ESTs
: but have problems getting the links sorted:
: the links from SW - EMBL are looking for the NI field and report

: INFO:
: no entries found

The SwissProt contains lines of the type :

DR   EMBL; M15203; G167391; -

So, they should be correctly parsed with the swissprot.sdl from
the distribution, since this contains :

    #field /gblid=%SWISSPROT_DR_FIELD
           /itype=link /ftype=@DF_LINK /idtype=@SRSxLINKID
           /begstr="DR   " /nextstr="DR EXIT " /maxlines=1
           /find=link
...

#readlink /id=%GENEMBL_REF /link=@SWISSPROT_EMBL_LINK,
                                     @SWISSPROT_GENBANK_LINK
...

link       = [emlink | pirlink | pdblink | proslink | rebaselink | mimlink];
emlink     = 'EMBL' ';' accno <wrt c=@GENEMBL_REF> ';';

...

#link /id=%SWISSPROT_EMBL_LINK
      /weight=10
      /lib1=@SWISSPROT_DB /lib2=@EMBL_DB
      /field1=@SWISSPROT_ID_FIELD /field2=@EMBL_ACC_FIELD
      /idtype1=@SRSxSEQID /idtype2=@SRSxSEQID

I really do not see what might go wrong...


: I also note that I have acquired (through no effort of mine!)
: a link from SW to MIM at wwwgdb.gdb.org but that this is asking
: for http://wwwgdb.gdb.org/omim/omimx which is "not found on
: this server"

I think OMIM is not available from the GDB site or they have changed
the path. Anyway, I get OMIM from the NCBI's Entrez site with :

/call="<a href=http://www3.ncbi.nlm.nih.gov/htbin-post/Omim/
  getmim?search=%s&field=number>%s</a>"

That does work.

I hope this will help you.

	Guy Bottu


From owner-srs@net.bio.net Wed Nov 20 22:00:00 1996
Path: biosci!ELIRIS.MED.YALE.EDU!lolis
From: lolis@ELIRIS.MED.YALE.EDU (Elias Lolis)
Newsgroups: bionet.software.srs
Subject: N-term analysis software
Date: 21 Nov 1996 13:21:45 -0800
Organization: BIOSCI International Newsgroups for Molecular Biology
Lines: 18
Sender: daemon@net.bio.net
Distribution: world
Message-ID: <3294C7B2.446B@eliris.med.yale.edu>
NNTP-Posting-Host: net.bio.net

Is there a program that can be used to compile a list of the frequencies
of individual amino acids at the N-terminus of mature proteins in the
database?

I am working on a cytokine with a proline at the N-terminus.  This
protein has structural (but not sequence) homology to 2 microbial
enzymes that use a proline at the N-terminus as a catalytic base.  There
are many similarities in the local environment of the proline between
the cytokine and the 2 microbial enzymes, and I am now investigating the
possibility that this cytokine has catalytic activity.  Toward that end,
I am interested in determining the frequency of proline (relative to
other amino acids) to be present at the amino terminus.  Does anyone
know of a program that can do this or whether such a list has been
compiled?  It will be important for the program to distinguish between 
entire polypeptides and mature, processed forms of proteins.  Thanks.

Elias Lolis
lolis@eliris.med.yale.edu

