Data directories for Grid computing: SRS-LDAP test

Don Gilbert gilbertd at bio.indiana.edu
Thu Jul 11 05:28:21 EST 2002


Dear folks w/ interests in bio-data distribution,

I've put together an experimental SRS6-LDAP gateway tool,
available at 
http://iubio.bio.indiana.edu/grid/directories/

This is in a very preliminary stage, essentially it is just
far enough along to assess its efficiency for bulk bio-sequence
search and retreival via LDAP (lightweight directory access protocol).  

If you are interested in trying it out, there are two simple
ldapsearch programs (java and perl) here, which have test
cases to the IUBio Archive SRS system.  They should work on most
computers.  A basic LDAP url for this is
  ldap://iubio.bio.indiana.edu:3895/srv=srs
which serves the same databanks as
  http://iubio.bio.indiana.edu/srs/

The source code for the SRS6-LDAP backend is included here, but I
haven't gotten documentation to a usable point - anyone trying
this out, let me know where I can help (besides the SRS6 binary
distribution you will need openldap2 (www.openldap.org) and a
C-Perl tool (for ldap to srs query conversion).

In comparison to the SRS6 tools "getz" (commandline) and "wgetz"
(web backend), it looks efficient and capable.  E.g. it performs
about the same speed over network as the local getz program, and
about 3x faster than wgetz over http, without adding any binary
encoded sequence formats (something you can do in LDAP, which uses
binary encoded ASN.1 as its transport stream).

E.g., it takes just a second or two to do a query, but to retrieve
sequence the time is proportional to number of sequence
records: 1.4 minutes for 20,000 records;  7 minutes for 350,000
records,  30 minutes for some 1.2 million records in GenBank
which mention 'human'.

-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu
---





More information about the Bio-srs mailing list