Genome databases, Bio-object directories and Grid computing

Don Gilbert gilbertd at bio.indiana.edu
Tue Nov 19 11:17:20 EST 2002


Over the past year, I've been looking at methods for building
distributed directories of biology data objects (from gene
sequences to genome and proteome features to biomedical
literature, phylogenetic and ontological classifications, and
on).  Directories are an important aspect for bio-grid computing,
such that one can computationally search, retrieve, replicate and
use more effectively the wealth of bio-data we now have.

Features of object directories need to be useable by many
bioinformatics projects and centers, and be robust for
the growing volumes and changing nature of biology data. Such
needs include

 -- build on existing, practical technology for finding and organizing
 Internet-distributed objects

 -- efficient and quick at handling millions of bio objects, by the
 gigabyte and terabyte, that we want to use.

 -- provide for queries that are distributed across directories 
 of collaborating services

 -- support existing and new data access mechanisms used in bioinformatics, 
 including relational databases, object and XML databases, bioinformatics 
 specific methods such as SRS, Entrez, AceDB.

 -- provide simple methods for programmable access to directories, in
 a range of programming languages

 -- use flexible, common schema for describing objects in directories

 -- able to replicate directories and data objects among bioinformatics
 centers

 -- able to build peer-to-peer data access systems for collaborative
 projects

 -- include current authentication and security technology for
 appropriate data access

Recent work has focused on comparing Lightweight directory access (LDAP) and
XML-WebServices (SOAP, WSDL, UDDI and others).  I'm using SRS as a backend
data access system currently, since it provides good access to
millions of objects in 100s of gigabytes among hundreds of bio-databanks.
Find more of this work at 
  http://iubio.bio.indiana.edu/biogrid/directories/

If you are interested in hooking any kind of bio-data access system -
relational databases, SRS, etc. to such distributed directory methods,
drop me a line.
 
For those of you interested in joining our team in developing
this next generation of bio-data access and bio-grid
computing, we have bioinformatics postdoctoral positions
available.  See
http://chipmunk.bio.indiana.edu/~gilbertd/positions-open/genome-infosys-job.html
http://chipmunk.bio.indiana.edu/~gilbertd/about/

-- Don Gilbert

-- 
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at bio.indiana.edu
---





More information about the Bio-soft mailing list