A nasty bug in the SRS link indexing

Martin Hilbers mph at dl.ac.uk
Tue Mar 3 12:13:49 EST 1998


Thure Etzold wrote:
> 
> > Dear SRS developers and fellow SRS server managers,
> >
> >While trying to implement a direct link between PIR and EMBL/GENBANK I
> >ran into trouble. After some searching I found that the cause was a bug
> >in the SRS software itself. It seems that when there is e.g. an Icarus
> >line :
> >$link:[@SWISSPROT_DB to:@?EMBL_DB token:'link|EMBL' toField:@DF_Accession]
> >not just tokens with code EMBL are put in the index, but all tokens.
> >
> 
> this problem has been reported before and i had problems reproducing it. it
> seems that it does occur rather infrequently and not as a rule.
> 
> >To convince yourself of this, try the following :
> >search in SWISSPROT the entry with ID HA12_MOUSE
> >then make the link to EMBL
> >you will see that besides the two correct EMBL entries you also find
> >A02201 which is not related to the sequence SWISSPROT:HA12_MOUSE but
> >happens to have the same accession number as the corresponding PIR
> >entry.
> >
> >SRS is great, but keeping it alive costs a lot of sweat and tears...
> 
> very sorry about that but your error report help a lot!
> 
> regards
> thure

It may have something to do with the "-s unix" command line option
used when you do a srsbuild via srscheck. Have a look at this:

# srsbuild swissprot -l -nn
...reading links to "GENBANK"
...reading links to "EMBL"
...reading links to "PIR"
...reading links to "PDB"
...reading links to "OMIM"
...processing /usr/local/gcg/data/gcgswissprot/swissprot.ref
...processing /usr/local/gcg/data/gcgswissprot/swissprot.seq
...wrote link from "SWISSPROT" to "GENBANK"
         valid references: 109413, invalid references: 158, 
         total number of links: 112514

...wrote link from "SWISSPROT" to "EMBL"
         valid references: 109396, invalid references: 175, 
         total number of links: 112116

...wrote link from "SWISSPROT" to "PIR"
         valid references: 47054, invalid references: 75, 
         total number of links: 39124

...wrote link from "SWISSPROT" to "PDB"
         valid references: 5753, invalid references: 51, 
         total number of links: 5753

...wrote link from "SWISSPROT" to "OMIM"
         valid references: 3760, invalid references: 0, 
         total number of links: 3760

...program "srsbuild" completed successfully.

looks good. Test it:
# getz '[swissprot-id:HA12_MOUSE]>embl'
EMBL:MM190
EMBL:MMU47326

now with "-s unix":
# srsbuild swissprot -l -nn -s unix
...reading links to "GENBANK"
...reading links to "EMBL"
...reading links to "PIR"
...reading links to "PDB"
...reading links to "REBASE"
...reading links to "OMIM"
...processing /usr/local/gcg/data/gcgswissprot/swissprot.ref
...processing /usr/local/gcg/data/gcgswissprot/swissprot.seq

..wrote link from "SWISSPROT" to "GENBANK"
         valid references: 125252, invalid references: 123397, 
         total number of links: 128143

...wrote link from "SWISSPROT" to "EMBL"
         valid references: 125155, invalid references: 123494, 
         total number of links: 127675

...wrote link from "SWISSPROT" to "PIR"
         valid references: 49253, invalid references: 199396, 
         total number of links: 41352

...wrote link from "SWISSPROT" to "PDB"
         valid references: 5753, invalid references: 242896, 
         total number of links: 5753

...wrote link from "SWISSPROT" to "OMIM"
         valid references: 3766, invalid references: 244883, 
         total number of links: 3766

...program "srsbuild" completed successfully.

Now the system tries to match all 250.000 links to all databases,
and a test:
# getz '[swissprot-id:HA12_MOUSE]>embl'
EMBL:A02201
EMBL:MM190
EMBL:MMU47326

pulls out A02201 - which should be a link to PIR

Cheers,
Martin

-- 
-------------------------------------------------------------------
| Martin Hilbers http://www.dci.clrc.ac.uk/Person.asp?m.p.hilbers |
| SEQNET                |     E-mail: m.p.hilbers at dl.ac.uk        |
| Daresbury Laboratory  |     Tel:    +44-1925-603492             |
| Daresbury, Warrington |     Fax:    +44-1925-603100             |
| Cheshire WA4 4AD      | SEQNET is the UK national EMBNet node   |
| United Kingdom        |     http://www.seqnet.dl.ac.uk/         |
-------------------------------------------------------------------




More information about the Bio-srs mailing list